
Scenario: Interning at the Global Risk Observatory (GRO)
You’ve just landed a midterm internship at the Global Risk Observatory (GRO), an international think tank that advises governments and humanitarian agencies on disaster preparedness and economic resilience. Your team is tasked with analyzing a newly released dataset: the Global Earthquake-Tsunami Risk Assessment Dataset, which contains seismic and tsunami-related data from 782 significant earthquakes worldwide between 2001 and 2022.
Your supervisor has asked you to process and analyze this dataset using the R programming to generate insights that can inform economic policy, infrastructure investment, and early warning systems.
Dataset Description
Warning: package 'tibble' was built under R version 4.5.3
Warning: package 'knitr' was built under R version 4.5.3
Variable metadata and tsunami relevance
| magnitude |
Float |
Earthquake magnitude (Richter scale) |
6.5 - 9.1 |
High — Primary tsunami predictor |
| cdi |
Integer |
Community Decimal Intensity (felt intensity) |
0 - 9 |
Medium — Population impact measure |
| mmi |
Integer |
Modified Mercalli Intensity (instrumental) |
1 - 9 |
Medium — Structural damage indicator |
| sig |
Integer |
Event significance score |
650 - 2910 |
High — Overall hazard assessment |
| nst |
Integer |
Number of seismic monitoring stations |
0 - 934 |
Low — Data quality indicator |
| dmin |
Float |
Distance to nearest seismic station (degrees) |
0.0 - 17.7 |
Low — Location precision |
| gap |
Float |
Azimuthal gap between stations (degrees) |
0.0 - 239.0 |
Low — Location reliability |
| depth |
Float |
Earthquake focal depth (km) |
2.7 - 670.8 |
High — Shallow = higher tsunami risk |
| latitude |
Float |
Epicenter latitude (WGS84) |
−61.85° to 71.63° |
High — Ocean proximity indicator |
| longitude |
Float |
Epicenter longitude (WGS84) |
−179.97° to 179.66° |
High — Ocean proximity indicator |
| Year |
Integer |
Year of occurrence |
2001 - 2022 |
Medium — Temporal patterns |
| Month |
Integer |
Month of occurrence |
1 - 12 |
Low — Seasonal analysis |
| tsunami |
Binary |
Tsunami potential (TARGET) |
0, 1 |
Target variable |
Your mission: complete the following tasks:
Use R and ggplot2 to complete each task. Submit your code and visual outputs with brief explanations.
- Load and inspect the Dataset
- Load the CSV file into R.
- Use
str(), summary(), and head() to inspect the structure and contents.
- Identify the number of tsunami vs. non-tsunami events.
- Earthquake magnitude distribution
- Create a histogram of earthquake magnitudes using
ggplot2
- Add appropriate axis labels and title.
- Briefly describe the distribution.
- Tsunami event proportion
- Create a bar chart showing the count of tsunami vs. non-tsunami events.
- Use
ggplot2 and customize colors.
- What percentage of events are tsunami-related?
- Earthquake frequency by year
- Create a line plot showing the number of earthquakes per year.
- Use
geom_line() or geom_col() with year as the x-axis
- Identify any years with spikes in activity.
- Latitude vs. longitude plot
- Create a scatter plot of earthquake epicenters using latitude and longitude.
- Color-code points by tsunami potential (0 or 1).
- What regions appear most tsunami-prone?
- Depth vs. magnitude
- Create a scatter plot of earthquake depth vs. magnitude
- Use color or shape to distinguish tsunami events.
- Discuss any patterns you observe.
- Intensity comparison
- Use boxplots to compare cdi (Community Decimal Intensity) between tsunami and non-tsunami events.
- What does this suggest about population impact?
- Annotated insight
- Choose one plot and add annotations using
geom_text() or geom_label() to highlight key insights.
- Explain why this insight matters for economic planning.
- Save your visuals
- Save at least three plots as PNG files using
ggsave().
- Include filenames and dimensions in your code.
- Reflection
- Write a short reflection (150–200 words) on how data visualization can support disaster resilience and economic decision-making.
Midterm files and submission:
Access the midterm R script and dataset from this link: Midterm Project Files
Submit your midterm exam answer sheet (R script with code, plots, and explanations) via google form: Midterm Submission Form
Deadline: March 31, 2026, 11:59 PM PST