This is a project initiated by the COVID19 Global Forecasting Kaggle competition intending to utilize data science to forecast the number of Cronavirus spread around the world. Pandemic is a heavy topic for everyone. I wanted to contribute with my knowledge in data science to potentially help discover the patterns of the Coronavirus spread and important features that affects the spread. Hopefully my findings can be helpful to lead some regions to take the correct actions.
The techniques I am planning to use for forecasting are:
- ARIMA
- Seq2Seq + LSTM (Deep Learning)
- Xgboost (Machine Learning)
covid19 - EDA.ipynb
- Notebook performing Exploratory Data Analysis on Global comfirmed cases and deaths before June 10thcovid19 - ARIMA.ipynb
- Notebook performing ARIMA algorithms to forecast Global comfirmed cases and deaths
(Note: You can find all those data from the data
folder on this GITHUB)
train.csv
test.csv
submission.csv
time_series_covid19_confirmed_global.csv
time_series_covid19_deaths_global.csv
time_series_covid19_recovered_global.csv
time_series_covid19_confirmed_US.csv