SanghyunKim1 / Data-Science-Capstone-Project

COVID-19 Country Clustering Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Science-Capstone-Project

In this interdisciplinary data science capstone project, my group members and I created a digital dashboard (tableau) that shows users groups of countries clustered based on country COVID-19, QS university rankings and socioeconomic index data.

COVID-19 Digital Dashboard

Link to our COVID-19 digital dashboard: Oabroad

Final Report

Link to the final report: Final Report

Aim

With loosened travel restrictions, we aim to help high school / undergraduate students make better decisions when choosing a university to study overseas.
As our country clustering system takes into account (1) COVID-19 spread, (2) country socioeconomic index, and (3) university rankings by subject, we expect our digital dashboard to help students find the most suitable countries for their study.

Individual Contribution - COVID-19 Country Clustering

As a data scientist of the group, I imputed COVID-19 missing data using a Multiple Imputation by Chained Equations (MICE) technique.
To cluster countries based on COVID-19 data, I selected the following three COVID-19 data features based on our user research and domain knowledge: 1. New cases smoothed per million, 2. New deaths smoothed per milloion, and 3. Stringency index
With these selected variables, I computed Dynamic Time Warping (DTW) distance matrix to identify similarity in shapes between two time series data. With this DTW distance matrix, I clustered countries using a hierarchical clutering algorithm with a complete linkage. The dendrogram below shows the resulting COVID-19 country clusters.

Acknowledgement

Since this was a group project, I would like to thank my group members: Christopher Tong, Ann Munkhbayar, Lawrence Chen, Chengyi Jin, and Xulin Wang.

About

COVID-19 Country Clustering Analysis


Languages

Language:HTML 100.0%