hongqin / COVID-19_Unified-Dataset

Unified COVID-19 Dataset from Johns Hopkins CSSE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unified COVID-19 Dataset

Copyright: © 2023 JHU) Credits: NASA/NIH DOI: 10.1101/2021.05.05.21256712 GitHub Commit

This is a unified COVID-19 dataset to fulfill the following objectives:

  • Mapping all geospatial units globally into a unique standardized ID.
  • Standardizing administrative names and codes at all levels.
  • Standardizing dates, data types, and formats.
  • Unifying variable names, types, and categories.
  • Merging data from all credible sources at all levels.
  • Cleaning the data and fixing confusing entries.
  • Integrating hydrometeorological variables at all levels.
  • Integrating population-weighted hydrometeorological variables.
  • Integrating air quality, comorbidities, WorldPop, and other static data.
  • Integrating policy data from Oxford government response tracker.
  • Integrating vaccination data from JHU Centers for Civic Impact.
  • Integrating estimates of daily infections (cases by date of infection).
  • Integrating an augmented version from all sources (future releases).
  • Generating epidemiological estimates of infections and effective reproduction number.
  • Optimizing the data for machine learning applications.
  • Providing multiple data formats, including the lightning fast fst.
  • Providing code to efficiently load and combine/subset the datasets (coming soon).

Coverage Map

COVID-19 Coverage

Geospatial ID

COVID-19 ID

Note that COVID-19 data for some European countries from Johns Hopkins University (JHU) Center for Systems Science and Engineering (CSSE) are reported in the global daily reports at province level, which will be replaced by higher-resolution data at NUTS 0-3 levels.

COVID-19 Data Structure

Column Type Description
ID Character Geospatial ID, unique identifier
Date Date Date of data record
Cases Integer Number of cumulative cases
Cases_New Integer Number of new daily cases
Type Character Type of the reported cases
Age Character Age group of the reported cases
Sex Character Sex/gender of the reported cases
Source Character Data source: JHU, CTP, NYC, NYT, UVA, SES, DPC, RKI, JRC, IHME

Case Types

Type Description
Active Active cases
Confirmed Confirmed cases
Deaths Deaths
Home_Confinement Home confinement / isolation
Hospitalized Total hospitalized cases excluding intensive care units
Hospitalized_Now Currently hospitalized cases excluding intensive care units
Hospitalized_Sym Symptomatic hospitalized cases excluding intensive care units
ICU Total cases in intensive care units
ICU_Now Currently in intensive care units
Infections Estimated infections
Negative Negative tests
Pending Pending tests
Positive Positive tests, including hospitalised cases and home confinement
Positive_Dx Positive cases emerged from clinical activity / diagnostics
Positive_Sc Positive cases emerging from surveys and tests
Recovered Recovered cases
Tested Cases tested = Tests - Pending
Tests Total performed tests
Ventilator Total cases receiving mechanical ventilation
Ventilator_Now Currently receiving mechanical ventilation

Lookup Table

Lookup Table

Epidemiological Estimates

COVID-19 Estimates

Static Data Structure

Static Data README

Hydromet Data Structure

Hydromet README

Policy Data Structure

Policy README

Vaccine Data Structure

Vaccine README

Data Sources

Source Description Level
JHU Johns Hopkins University CSSE Global & County/State, United States
CTP The COVID Tracking Project State, United States
NYC New York City Department of Health and Mental Hygiene ZCTA/Borough, New York City
NYT The New York Times County/State, United States
UVA University of Virginia School of Medicine Municipality/State, South America
SES Monitoring COVID-19 Cases and Deaths in Brazil Municipality/State/Country, Brazil
DPC Italian Civil Protection Department NUTS 0-3, Italy
RKI Robert Koch-Institut, Germany NUTS 0-3, Germany
JRC Joint Research Centre Global & NUTS 0-3, Europe
ERA5 The fifth generation of ECMWF reanalysis All levels
NLDAS North American Land Data Assimilation System County/State, United States
CIESIN C. for International Earth Science Information Net. Global gridded population
OxCGRT Oxford COVID-19 Government Response Tracker National (global) & subnational (US, UK)
CRC Johns Hopkins Centers for Civic Impact National (global) & subnational (US)
IHME Institute for Health Metrics and Evaluation National (global) & subnational (US)

Credits

This work is supported by NASA Health & Air Quality project 80NSSC18K0327, under a COVID-19 supplement, and National Institute of Health (NIH) project 3U19AI135995-03S1 ("Consortium for Viral Systems Biology (CViSB)"; Collaboration with The Scripps Research Institute and UCLA).

Citation

To cite this dataset:

Badr, H. S., B. F. Zaitchik, G. H. Kerr, N. Nguyen, Y. Chen, P. Hinson, J. M. Colston, M. N. Kosek, E. Dong, H. Du, M. Marshall, K. Nixon, A. Mohegh, D. L. Goldberg, S. C. Anenberg, and L. M. Gardner, 2021: Unified real-time environmental-epidemiological data for multiscale modeling of the COVID-19 pandemic. MedRxiv, 2021.05.05.21256712.

BibTeX

@article {Badr2021.05.05.21256712,
	author = {Badr, Hamada S. and Zaitchik, Benjamin F. and Kerr, Gaige H. and Nguyen, Nhat-Lan and Chen, Yen-Ting and Hinson, Patrick and Colston, Josh M. and Kosek, Margaret N. and Dong, Ensheng and Du, Hongru and Marshall, Maximilian and Nixon, Kristen and Mohegh, Arash and Goldberg, Daniel L. and Anenberg, Susan C. and Gardner, Lauren M.},
	title = {Unified real-time environmental-epidemiological data for multiscale modeling of the COVID-19 pandemic},
	elocation-id = {2021.05.05.21256712},
	year = {2021},
	doi = {10.1101/2021.05.05.21256712},
	publisher = {Cold Spring Harbor Laboratory Press},
	abstract = {An impressive number of COVID-19 data catalogs exist. None, however, are optimized for data science applications, e.g., inconsistent naming and data conventions, uneven quality control, and lack of alignment between disease data and potential predictors pose barriers to robust modeling and analysis. To address this gap, we generated a unified dataset that integrates and implements quality checks of the data from numerous leading sources of COVID-19 epidemiological and environmental data. We use a globally consistent hierarchy of administrative units to facilitate analysis within and across countries. The dataset applies this unified hierarchy to align COVID-19 case data with a number of other data types relevant to understanding and predicting COVID-19 risk, including hydrometeorological data, air quality, information on COVID-19 control policies, and key demographic characteristics.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work is supported by NASA Health \& Air Quality project 80NSSC18K0327, under a COVID-19 supplement, National Institute of Health (NIH) project 3U19AI135995-03S1 ("Consortium for Viral Systems Biology (CViSB)"; Collaboration with The Scripps Research Institute and UCLA), and NASA grant 80NSSC20K1122. Johns Hopkins Applied Physics Laboratory (APL), Data Services and Esri provide professional support on designing the automatic data collection structure, and maintaining the JHU CSSE GitHub repository.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB approval is not required.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe source code used to clean, unify, aggregate, and merge the different data components from all sources will be available on GitHub.https://github.com/CSSEGISandData/COVID-19_Unified-Dataset},
	URL = {https://www.medrxiv.org/content/early/2021/05/07/2021.05.05.21256712},
	eprint = {https://www.medrxiv.org/content/early/2021/05/07/2021.05.05.21256712.full.pdf},
	journal = {medRxiv}
}

About

Unified COVID-19 Dataset from Johns Hopkins CSSE