Olliang / COVID-19-Forecasting

A self-driven project utilizing ARIMA, Seq2Seq, and XGBoost to help design the COVID19 forecasting algorithm. Data sources are from Kaggle Competition and JHU CSSE.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COVID-19-Forecasting


Introduction


This is a project initiated by the COVID19 Global Forecasting Kaggle competition intending to utilize data science to forecast the number of Cronavirus spread around the world. Pandemic is a heavy topic for everyone. I wanted to contribute with my knowledge in data science to potentially help discover the patterns of the Coronavirus spread and important features that affects the spread. Hopefully my findings can be helpful to lead some regions to take the correct actions.

The techniques I am planning to use for forecasting are:

  • ARIMA
  • Seq2Seq + LSTM (Deep Learning)
  • Xgboost (Machine Learning)

Main Files


  • covid19 - EDA.ipynb - Notebook performing Exploratory Data Analysis on Global comfirmed cases and deaths before June 10th
  • covid19 - ARIMA.ipynb - Notebook performing ARIMA algorithms to forecast Global comfirmed cases and deaths

Data Sources

(Note: You can find all those data from the data folder on this GITHUB)

  1. Kaggle: COVID19 Global Forecasting (Week 5)
  • train.csv
  • test.csv
  • submission.csv

  1. JHU CSSE COVID-19 Dataset
  • time_series_covid19_confirmed_global.csv
  • time_series_covid19_deaths_global.csv
  • time_series_covid19_recovered_global.csv
  • time_series_covid19_confirmed_US.csv

About

A self-driven project utilizing ARIMA, Seq2Seq, and XGBoost to help design the COVID19 forecasting algorithm. Data sources are from Kaggle Competition and JHU CSSE.


Languages

Language:Jupyter Notebook 100.0%