chrishas35 / covid-19-datasette

Deploys a Datasette instance of COVID-19 data from Johns Hopkins CSSE and the New York Times

Home Page:https://covid-19.datasettes.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

covid-19-datasette

Fetch latest data and deploy with Datasette

Deploys a Datasette instance with data from the following sources:

The Datasette instance lives at https://covid-19.datasettes.com/ and is updated every two hours using a scheduled GitHub Action.

Please do not use this tool to share information about COVID-19 without making absolutely sure you understand how the data is structured and sourced.

More about this project: COVID-19 numbers in Datasette.

This repository uses the deployment pattern described in Deploying a data API using GitHub Actions and Cloud Run.

Johns Hopkins

The database is partly built from the daily report CSV files in the Johns Hopkins CSSE csse_covid_19_data folder - be sure to consult their README for documentation of the fields.

They are actively making changes to how they report data. You should follow their issues closely for updates - for example this issue about switching from reporting USA data at the county to the state level.

The build script for the database makes one alteration to their data: it attempts to fill any missing latitude and longitude columns with values from similar rows.

If you are going to make use of those columns, make sure you understand how that backfill mechanism works in case it affects your calculations in some way.

New York Times

The New York Times has a comprehensive README describing how their data is sourced. You should read it! They announced their data in We’re Sharing Coronavirus Case Data for Every U.S. County.

They are using the data for their Coronavirus in the U.S.: Latest Map and Case Count article.

Example issues

  • Remember: the number of reported cases is very heavily influenced by the availability of testing.
  • On the 23rd March 2020 Johns Hopkins added four new columns to the daily CSV file: admin2, fips, active and combined_key. These are not present in older CSV files. #4.
  • Some countries (like Italy) are represented by just the rows with country_or_region set to Italy (and province_or_state set to null). Larger countries such as the United States have multiple rows for each day divided into separate province_or_state values - example.
  • Santa Clara County appears to be represented as Santa Clara, CA in some records and Santa Clara County, CA in others - example.
  • Passengers from the Diamond Princess cruise are represented by a number of different rows with "From Diamond Princess" in their province_or_state column - example.

About

Deploys a Datasette instance of COVID-19 data from Johns Hopkins CSSE and the New York Times

https://covid-19.datasettes.com/


Languages

Language:Python 100.0%