Custering-Analysis-in-R

This assignment aims at discovering whether there exist any regional patterns in the spread of the COVID-19 virus through the use of cluster analysis statistical modeling on the countries COVID data collected from the Our World Data website. The dataset used in this assignment has 30 variables related to COVID-19 cases for 208 different countries. The data was collected from the start of the pandemic to $02^{nd}/09/2020$. The clustering analysis was done using R Programming Language and cluster statistical learning algorithms of Hierarchical clustering, Kmeans, and Partitioning Around Mediods(PAM). A model of six(6) clusters was built and silhouette plots were used to assess the quality of the clustering. And the hierarchical clustering model produced the highest average silhouette width of $\color{red}{\text{0.85}}$. And since different countries on the same continent have been affected differently by the virus, therefore in this regard clustering models couldn't group countries regionally. Countries were clustered depending on how they have been hit by the coronavirus pandemic.

About

This assignment aims at discovering whether there exist any regional patterns in the spread of the COVID-19 virus through the use of cluster analysis statistical modeling on the countries COVID data collected from the Our World Data website. The dataset used in this assignment has 30 variables related to COVID-19 cases for 208 different countries. The data was collected from the start of the pandemic to $02^{nd}/09/2020$. The clustering analysis was done using R Programming Language and cluster statistical learning algorithms of Hierarchical clustering, Kmeans, and Partitioning Around Mediods(PAM). A model of six(6) clusters was built and silhouette plots were used to assess the quality of the clustering. And the hierarchical clustering model produced the highest average silhouette width of $\color{red}{\text{0.85}}$. And since different countries on the same continent have been affected differently by the virus, therefore in this regard clustering models couldn't group countries regionally. Countries were clustered depending on how they have been hit by the coronavirus pandemic.