| Feature | Type | Description | Range / Values | Tsunami relevance |
|---|---|---|---|---|
| magnitude | Float | Earthquake magnitude (Richter scale) | 6.5 - 9.1 | High — Primary tsunami predictor |
| cdi | Integer | Community Decimal Intensity (felt intensity) | 0 - 9 | Medium — Population impact measure |
| mmi | Integer | Modified Mercalli Intensity (instrumental) | 1 - 9 | Medium — Structural damage indicator |
| sig | Integer | Event significance score | 650 - 2910 | High — Overall hazard assessment |
| nst | Integer | Number of seismic monitoring stations | 0 - 934 | Low — Data quality indicator |
| dmin | Float | Distance to nearest seismic station (degrees) | 0.0 - 17.7 | Low — Location precision |
| gap | Float | Azimuthal gap between stations (degrees) | 0.0 - 239.0 | Low — Location reliability |
| depth | Float | Earthquake focal depth (km) | 2.7 - 670.8 | High — Shallow = higher tsunami risk |
| latitude | Float | Epicenter latitude (WGS84) | −61.85° to 71.63° | High — Ocean proximity indicator |
| longitude | Float | Epicenter longitude (WGS84) | −179.97° to 179.66° | High — Ocean proximity indicator |
| Year | Integer | Year of occurrence | 2001 - 2022 | Medium — Temporal patterns |
| Month | Integer | Month of occurrence | 1 - 12 | Low — Seasonal analysis |
| tsunami | Binary | Tsunami potential (TARGET) | 0, 1 | Target variable |

Scenario: Interning at the Global Risk Observatory (GRO)
You’ve just landed a midterm internship at the Global Risk Observatory (GRO), an international think tank that advises governments and humanitarian agencies on disaster preparedness and economic resilience. Your team is tasked with analyzing a newly released dataset: the Global Earthquake-Tsunami Risk Assessment Dataset, which contains seismic and tsunami-related data from 782 significant earthquakes worldwide between 2001 and 2022.
Your supervisor has asked you to process and analyze this dataset using the R programming to generate insights that can inform economic policy, infrastructure investment, and early warning systems.
Dataset Description
Your mission: complete the following tasks:
Use R and ggplot2 to complete each task. Submit your code and visual outputs with brief explanations.
- Load and inspect the Dataset
- Load the CSV file into R.
- Use
str(),summary(), andhead()to inspect the structure and contents. - Identify the number of tsunami vs. non-tsunami events.
- Earthquake magnitude distribution
- Create a histogram of earthquake magnitudes using
ggplot2 - Add appropriate axis labels and title.
- Briefly describe the distribution.
- Create a histogram of earthquake magnitudes using
- Tsunami event proportion
- Create a bar chart showing the count of tsunami vs. non-tsunami events.
- Use
ggplot2and customize colors. - What percentage of events are tsunami-related?
- Earthquake frequency by year
- Create a line plot showing the number of earthquakes per year.
- Use
geom_line()orgeom_col()with year as the x-axis - Identify any years with spikes in activity.
- Latitude vs. longitude plot
- Create a scatter plot of earthquake epicenters using latitude and longitude.
- Color-code points by tsunami potential (0 or 1).
- What regions appear most tsunami-prone?
- Depth vs. magnitude
- Create a scatter plot of earthquake depth vs. magnitude
- Use color or shape to distinguish tsunami events.
- Discuss any patterns you observe.
- Intensity comparison
- Use boxplots to compare cdi (Community Decimal Intensity) between tsunami and non-tsunami events.
- What does this suggest about population impact?
- Annotated insight
- Choose one plot and add annotations using
geom_text()orgeom_label()to highlight key insights. - Explain why this insight matters for economic planning.
- Choose one plot and add annotations using
- Save your visuals
- Save at least three plots as PNG files using
ggsave(). - Include filenames and dimensions in your code.
- Save at least three plots as PNG files using
- Reflection
- Write a short reflection (150–200 words) on how data visualization can support disaster resilience and economic decision-making.
Midterm files and submission:
Access the midterm R script and dataset from this link: Midterm Project Files
Submit your midterm exam answer sheet (R script with code, plots, and explanations) via google form: Midterm Submission Form
Deadline: March 31, 2026, 11:59 PM PST