Scenario: Interning at the Global Risk Observatory (GRO)

You’ve just landed a midterm internship at the Global Risk Observatory (GRO), an international think tank that advises governments and humanitarian agencies on disaster preparedness and economic resilience. Your team is tasked with analyzing a newly released dataset: the Global Earthquake-Tsunami Risk Assessment Dataset, which contains seismic and tsunami-related data from 782 significant earthquakes worldwide between 2001 and 2022.

Your supervisor has asked you to process and analyze this dataset using the R programming to generate insights that can inform economic policy, infrastructure investment, and early warning systems.

Dataset Description

Variable metadata and tsunami relevance
Feature Type Description Range / Values Tsunami relevance
magnitude Float Earthquake magnitude (Richter scale) 6.5 - 9.1 High — Primary tsunami predictor
cdi Integer Community Decimal Intensity (felt intensity) 0 - 9 Medium — Population impact measure
mmi Integer Modified Mercalli Intensity (instrumental) 1 - 9 Medium — Structural damage indicator
sig Integer Event significance score 650 - 2910 High — Overall hazard assessment
nst Integer Number of seismic monitoring stations 0 - 934 Low — Data quality indicator
dmin Float Distance to nearest seismic station (degrees) 0.0 - 17.7 Low — Location precision
gap Float Azimuthal gap between stations (degrees) 0.0 - 239.0 Low — Location reliability
depth Float Earthquake focal depth (km) 2.7 - 670.8 High — Shallow = higher tsunami risk
latitude Float Epicenter latitude (WGS84) −61.85° to 71.63° High — Ocean proximity indicator
longitude Float Epicenter longitude (WGS84) −179.97° to 179.66° High — Ocean proximity indicator
Year Integer Year of occurrence 2001 - 2022 Medium — Temporal patterns
Month Integer Month of occurrence 1 - 12 Low — Seasonal analysis
tsunami Binary Tsunami potential (TARGET) 0, 1 Target variable

Your mission: complete the following tasks:

Use R and ggplot2 to complete each task. Submit your code and visual outputs with brief explanations.

  1. Load and inspect the Dataset
    • Load the CSV file into R.
    • Use str(), summary(), and head() to inspect the structure and contents.
    • Identify the number of tsunami vs. non-tsunami events.
  2. Earthquake magnitude distribution
    • Create a histogram of earthquake magnitudes using ggplot2
    • Add appropriate axis labels and title.
    • Briefly describe the distribution.
  3. Tsunami event proportion
    • Create a bar chart showing the count of tsunami vs. non-tsunami events.
    • Use ggplot2 and customize colors.
    • What percentage of events are tsunami-related?
  4. Earthquake frequency by year
    • Create a line plot showing the number of earthquakes per year.
    • Use geom_line() or geom_col() with year as the x-axis
    • Identify any years with spikes in activity.
  5. Latitude vs. longitude plot
    • Create a scatter plot of earthquake epicenters using latitude and longitude.
    • Color-code points by tsunami potential (0 or 1).
    • What regions appear most tsunami-prone?
  6. Depth vs. magnitude
    • Create a scatter plot of earthquake depth vs. magnitude
    • Use color or shape to distinguish tsunami events.
    • Discuss any patterns you observe.
  7. Intensity comparison
    • Use boxplots to compare cdi (Community Decimal Intensity) between tsunami and non-tsunami events.
    • What does this suggest about population impact?
  8. Annotated insight
    • Choose one plot and add annotations using geom_text() or geom_label() to highlight key insights.
    • Explain why this insight matters for economic planning.
  9. Save your visuals
    • Save at least three plots as PNG files using ggsave().
    • Include filenames and dimensions in your code.
  10. Reflection
    • Write a short reflection (150–200 words) on how data visualization can support disaster resilience and economic decision-making.

Midterm files and submission:

  1. Access the midterm R script and dataset from this link: Midterm Project Files

  2. Submit your midterm exam answer sheet (R script with code, plots, and explanations) via google form: Midterm Submission Form

  3. Deadline: March 31, 2026, 11:59 PM PST