COVID-19 Data Processing Pipelines and datasets
This repository hosts workflows to process several data sources and cleaned datasets for COVID-19 cases across the world.
Datasets
Historical (daily) case data
output/cases/cases_ECDC.csv
: European Centre for Disease Prevention and Control (ECDC) historical world-wide case data (currently through Our World in Data).output/cases/cases_us_states_nyt.csv
: US state-level historical case data from New York Times
Country metadata
- Country metadata:
output/metadata/country/country_metadata.csv
from Worldbank. - ISO 3166-1 Alpha-3 country code conversion table.
output/metadata/country/country_name_code.csv
: a conversion table from country name to code (ISO 3166 Alpha 3). Note that multiple names point to the same code.output/metadata/country/country_code_name.csv
: a conversion table from country code (ISO 3166 Alpha 3) to country name. The shortest country names are picked from the above dataset.
Historical case data for visualizations
cntry_stat_owid.json
: ECDC historical data merged with Worldbank's country metadata and ISO 3166-1 Alpha-3 data. Used in:- an interactive visualization of case fatality rate of COVID-19
- Website source code: https://github.com/covid19-data/covid19-dashboard
- visualization source code on ObservableHQ: https://observablehq.com/@yy/covid-19-fatality-rate and https://observablehq.com/@yy/covid-19-trends
- An example to create case time series charts in ObservableHQ by benjyz
- an interactive visualization of case fatality rate of COVID-19
us_state_nyt.json
: New York Time historical data. Used in:
Deprecated
WHO dataset is deprecated. See Our World in Data's announcement: Why we stopped relying on data from the World Health Organization
output/cases/cases_WHO.csv
- https://www.worldometers.info/coronavirus/
coordinates.csv
: Lat Lng location data from JHU dataset (Unreliable).
Usage
Install pandas and snakemake using conda
.
conda install -c bioconda -c conda-forge snakemake pandas numpy
or pip
:
pip install pandas snakemake numpy