COVID-19-dataset-and-World-Happiness-Report-Analysis

I performed Data Analysis on COVID 19 dataset by John Hopkins University and World Happiness Report and found really interesting results. It shows that people living in developed countries are more prone to infection of the Corona Virus than people living in less developed countries.

Welcome to the Covid 19 Data Analysis Note book

Author : Poshan Pandey

Date : 6/5/2020

Let's import the modules

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
print("All modules imported!")

All modules imported!

Let's import the covid 19 datasets by John Hopkins University

https://github.com/CSSEGISandData/COVID-19

corona_dataset_csv = pd.read_csv("Datasets/time_series_covid19_confirmed_global.csv")

corona_dataset_csv.head()

	Province/State	Country/Region	Lat	Long	...	5/26/20	5/27/20	5/28/20	5/29/20	5/30/20	5/31/20	6/1/20	6/2/20	6/3/20	6/4/20
0	NaN	Afghanistan	33.0000	65.0000	...	11831	12456	13036	13659	14525	15205	15750	16509	17267	18054
1	NaN	Albania	41.1533	20.1683	...	1029	1050	1076	1099	1122	1137	1143	1164	1184	1197
2	NaN	Algeria	28.0339	1.6596	...	8697	8857	8997	9134	9267	9394	9513	9626	9733	9831
3	NaN	Andorra	42.5063	1.5218	...	763	763	763	764	764	764	765	844	851	852
4	NaN	Angola	-11.2027	17.8739	...	70	71	74	81	84	86	86	86	86	86

5 rows × 139 columns

Checking the shape of the data

corona_dataset_csv.shape

(266, 139)

Deleting unnecessary columns

corona_dataset_csv.drop(["Lat", "Long"], axis = 1, inplace = True)
corona_dataset_csv.head(15)

	Province/State	Country/Region	1/26/20	1/27/20	1/28/20	1/29/20	...	5/26/20	5/27/20	5/28/20	5/29/20	5/30/20	5/31/20	6/1/20	6/2/20	6/3/20	6/4/20
0	NaN	Afghanistan	0	0	0	0	...	11831	12456	13036	13659	14525	15205	15750	16509	17267	18054
1	NaN	Albania	0	0	0	0	...	1029	1050	1076	1099	1122	1137	1143	1164	1184	1197
2	NaN	Algeria	0	0	0	0	...	8697	8857	8997	9134	9267	9394	9513	9626	9733	9831
3	NaN	Andorra	0	0	0	0	...	763	763	763	764	764	764	765	844	851	852
4	NaN	Angola	0	0	0	0	...	70	71	74	81	84	86	86	86	86	86
5	NaN	Antigua and Barbuda	0	0	0	0	...	25	25	25	25	25	26	26	26	26	26
6	NaN	Argentina	0	0	0	0	...	13228	13933	14702	15419	16214	16851	17415	18319	19268	20197
7	NaN	Armenia	0	0	0	0	...	7402	7774	8216	8676	8927	9282	9492	10009	10524	11221
8	Australian Capital Territory	Australia	0	0	0	0	...	107	107	107	107	107	107	107	107	107	107
9	New South Wales	Australia	3	4	4	4	...	3089	3090	3092	3092	3095	3098	3104	3104	3106	3110
10	Northern Territory	Australia	0	0	0	0	...	29	29	29	29	29	29	29	29	29	29
11	Queensland	Australia	0	0	0	1	...	1058	1058	1058	1058	1058	1058	1059	1059	1060	1060
12	South Australia	Australia	0	0	0	0	...	440	440	440	440	440	440	440	440	440	440
13	Tasmania	Australia	0	0	0	0	...	228	228	228	228	228	228	228	228	228	228
14	Victoria	Australia	1	1	1	1	...	1618	1628	1634	1645	1649	1653	1663	1670	1678	1681

15 rows × 137 columns

Aggregating the data of all province/state of similar country

aggregated_corona_dataset = corona_dataset_csv.groupby("Country/Region").sum()
aggregated_corona_dataset.head(10)

	1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	...	5/26/20	5/27/20	5/28/20	5/29/20	5/30/20	5/31/20	6/1/20	6/2/20	6/3/20	6/4/20
Country/Region
Afghanistan	0	0	0	0	0	0	0	0	0	0	...	11831	12456	13036	13659	14525	15205	15750	16509	17267	18054
Albania	0	0	0	0	0	0	0	0	0	0	...	1029	1050	1076	1099	1122	1137	1143	1164	1184	1197
Algeria	0	0	0	0	0	0	0	0	0	0	...	8697	8857	8997	9134	9267	9394	9513	9626	9733	9831
Andorra	0	0	0	0	0	0	0	0	0	0	...	763	763	763	764	764	764	765	844	851	852
Angola	0	0	0	0	0	0	0	0	0	0	...	70	71	74	81	84	86	86	86	86	86
Antigua and Barbuda	0	0	0	0	0	0	0	0	0	0	...	25	25	25	25	25	26	26	26	26	26
Argentina	0	0	0	0	0	0	0	0	0	0	...	13228	13933	14702	15419	16214	16851	17415	18319	19268	20197
Armenia	0	0	0	0	0	0	0	0	0	0	...	7402	7774	8216	8676	8927	9282	9492	10009	10524	11221
Australia	0	0	0	0	4	5	5	6	9	9	...	7139	7150	7165	7184	7192	7202	7221	7229	7240	7247
Austria	0	0	0	0	0	0	0	0	0	0	...	16557	16591	16628	16655	16685	16731	16733	16759	16771	16805

10 rows × 135 columns

Visualizing the Corona Infection data of Nepal

aggregated_corona_dataset.loc["Nepal"].plot()
plt.title("Rate of Covid 19 Growth in Nepal")
plt.legend()

<matplotlib.legend.Legend at 0x21a2bd54e88>

Calculating derivative of above curve and finding the maximum infection rate

aggregated_corona_dataset.loc["Nepal"].diff().plot()

<matplotlib.axes._subplots.AxesSubplot at 0x21a2be86408>

aggregated_corona_dataset.loc["Nepal"].diff().max()

334.0

Finding maximum infection rate for all the countries and Adding it to new column in Dataframe

countries = list(aggregated_corona_dataset.index)
max_infection_rates = []
for c in countries:
    max_infection_rates.append(aggregated_corona_dataset.loc[c].diff().max())
aggregated_corona_dataset["max_infection_rates"] = max_infection_rates
aggregated_corona_dataset.head()

	1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	...	5/27/20	5/28/20	5/29/20	5/30/20	5/31/20	6/1/20	6/2/20	6/3/20	6/4/20	max_infection_rates
Country/Region
Afghanistan	0	0	0	0	0	0	0	0	0	0	...	12456	13036	13659	14525	15205	15750	16509	17267	18054	866.0
Albania	0	0	0	0	0	0	0	0	0	0	...	1050	1076	1099	1122	1137	1143	1164	1184	1197	34.0
Algeria	0	0	0	0	0	0	0	0	0	0	...	8857	8997	9134	9267	9394	9513	9626	9733	9831	199.0
Andorra	0	0	0	0	0	0	0	0	0	0	...	763	763	764	764	764	765	844	851	852	79.0
Angola	0	0	0	0	0	0	0	0	0	0	...	71	74	81	84	86	86	86	86	86	8.0

5 rows × 136 columns

Creating new dataframe with countries and maximum infection rate only

corona_data = pd.DataFrame(aggregated_corona_dataset["max_infection_rates"])
corona_data.head()

	max_infection_rates
Country/Region
Afghanistan	866.0
Albania	34.0
Algeria	199.0
Andorra	79.0
Angola	8.0

Importing the World Happiness Report dataset

happiness_report_csv = pd.read_csv("Datasets/worldwide_happiness_report.csv")
happiness_report_csv.head()

	Overall rank	Country or region	Score	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption
0	1	Finland	7.769	1.340	1.587	0.986	0.596	0.153	0.393
1	2	Denmark	7.600	1.383	1.573	0.996	0.592	0.252	0.410
2	3	Norway	7.554	1.488	1.582	1.028	0.603	0.271	0.341
3	4	Iceland	7.494	1.380	1.624	1.026	0.591	0.354	0.118
4	5	Netherlands	7.488	1.396	1.522	0.999	0.557	0.322	0.298

Deleting the unnecessary columns and changing indices to Country or region

useless_cols = ["Overall rank", "Score", "Generosity", "Perceptions of corruption"]
happiness_report_csv.drop(useless_cols, axis = 1, inplace = True)
happiness_report_csv.set_index("Country or region", inplace= True)
happiness_report_csv.head()

	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Country or region
Finland	1.340	1.587	0.986	0.596
Denmark	1.383	1.573	0.996	0.592
Norway	1.488	1.582	1.028	0.603
Iceland	1.380	1.624	1.026	0.591
Netherlands	1.396	1.522	0.999	0.557

Comparing Number of countries in Happiness and Covid 19 datasets

corona_data.shape

(188, 1)

happiness_report_csv.shape

(156, 4)

Number of countries in Corina dataset is more than World Happiness Report Dataset

So, We have to join them ussing Inner join

final_data = corona_data.join(happiness_report_csv, how = "inner")
final_data.head()

	max_infection_rates	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
Afghanistan	866.0	0.350	0.517	0.361	0.000
Albania	34.0	0.947	0.848	0.874	0.383
Algeria	199.0	1.002	1.160	0.785	0.086
Argentina	949.0	1.092	1.432	0.881	0.471
Armenia	697.0	0.850	1.055	0.815	0.283

Calculating Correleation Matrix for the final Data

final_data.corr()

	max_infection_rates	GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices
max_infection_rates	1.000000	0.207071	0.158977	0.218118	0.071825
GDP per capita	0.207071	1.000000	0.757521	0.859431	0.394799
Social support	0.158977	0.757521	1.000000	0.751632	0.456317
Healthy life expectancy	0.218118	0.859431	0.751632	1.000000	0.423146
Freedom to make life choices	0.071825	0.394799	0.456317	0.423146	1.000000

Visualizing our final result

Plotting GDP vs Maximum Infection Rate

x = final_data["GDP per capita"]
y = final_data["max_infection_rates"]
sns.regplot(x,np.log(y)).set_title("Relationship Between Corona Infection Rate and GDP per Capita")

Text(0.5, 1.0, 'Relationship Between Corona Infection Rate and GDP per Capita')

Plotting Social support vs Maximum Infection Rate

x = final_data["Social support"]
y = final_data["max_infection_rates"]
sns.regplot(x,np.log(y)).set_title("Relationship Between Corona Infection Rate and Social Support")

Text(0.5, 1.0, 'Relationship Between Corona Infection Rate and Social Support')

Plotting Social support vs Health Life Expectancy

x = final_data["Healthy life expectancy"]
y = final_data["max_infection_rates"]
sns.regplot(x,np.log(y)).set_title("Relationship Between Corona Infection Rate and Health Life Expectancy")

Text(0.5, 1.0, 'Relationship Between Corona Infection Rate and Health Life Expectancy')

Plotting Social support vs Freedom to make life choices

x = final_data["Freedom to make life choices"]
y = final_data["max_infection_rates"]
sns.regplot(x,np.log(y)).set_title("Relationship Between Corona Infection Rate and Freedom to make life choices")

Text(0.5, 1.0, 'Relationship Between Corona Infection Rate and Freedom to make life choices')

NCIT-Developer-Network / COVID-19-dataset-and-World-Happiness-Report-Analysis