sergei-mironov / COVID-19_plus_Russia

COVID-19 data from JHU CSSE, updated with details on Russian regions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University

Goals

  1. Provide COVID19 dataset containing detailed information on Russia.
  2. Maintain CSSE compatibility
  3. Provide some higher level APIs for accessing the data.
  4. Close the project after a more systematic approch is developed

Disclamer: the author doesn't have relationships with any government or commercial organisations. The data provided here are collected from unreliable sources and may be not accurate. Use it at your own risk.

Отказ от ответственности: автор не имеет отношения к государственным или коммерческим организациям. Данные, приведенные здесь, собраны из ненадежных источников и могут быть неточными. Используйте их на свой страх и риск.

Contents

Directory structure

Visualization

Daily cases in Russian regions, Top-10

English version of the plot

Daily cases in Russian regions, positions 11-21

English version of the plot

Confirmed cases in Russian regions, Top-10

English version of the plot

Confirmed cases in Russian regions, positions 11-21

English version of the plot

Data sources

Primary sources

Related repos

Update procedure

Originally, author filled the data on Moscow and Saint Petersburg manually, based on Rospotrebnadzor and NovelCoronaVirusChannel data. Starting from March, 25 we follow the below procedure:

  1. Fetch hourly data from Yandex COVID map
    • Fetching is done by running monitor function of the fetcher script
    • The data is saved into pending folder, stamped with UTC time.
  2. Fetch daily upstream updates by using regular git fetch manually.
  3. If update is available,
    1. Rebase repository to upstream/master branch using git rebase
    2. For every csse_covid_19_data/csse_covid_19_daily_reports file which doesn't have russian details, do the following:
      1. Determine the update time of 'Russia' record found in the world data. The time is supposed to be UTC. The update time is often near 23:30 (supposedly UTC time).
      2. Find the russian details dump in pending folder which has the closest UTC timestamp.
      3. Update world information file by inserting russian details manually.
      4. Review the format compatibility (CSV fields order, date format, etc.).
      5. Run the checker script.
      6. Update RU timeline by calling ru_timeline_dump() of access.py.
      7. Update plots by running plot script.
      8. Commit the changes to this repository, forcebly push (due to rebase) here.

Roadmap

  • Python code to check the correctness of CSV files
  • Python API to access the CSV data. It should handle the CSV format change which happened around 23.03.2020
  • Semi-automated data loader from Yandex. Ideally, we want to perform the following actions:
    • Collect Confirmed/Death/Recovered info for each Russian city (starting from 03-25-2020.csv)
    • Save this information in a temporary file to handle update gap
    • Set correct value of Longitude/Latitude for Russian regions
    • Figure out what does 'Active' field mean and how to get it.
      • Seems that it is just Confirmed-Deaths-Recovered. One have to update the data which miss this value.
  • Make periodical dumps of rospotrebnadzor cite. Try to track possible source of data inconsistency.
  • Auto-generate timeseries
  • Change pre-02.06.2020 names of regions to match the upstream ones.
  • Daily update the dataset with information on russian regions
  • Find data on Russian regions for pre- 25.03.2020 period.

Log

11.06.2020

  • Finally changed region names to match the upstream.

02.06.2020

  • Found details on russian regions in the upstream. Looks like they started to track them as well.

03.05.2020

  • Decline in death cases in Arkhangelsk oblast. 5 deaths were reported in 01.05.2020 but only 1 death left in reports in 02.05.2020.

30.04.2020

  • Another 'resurrected' case, this time in Lipetsk oblast: deaths decreased from 4 in 28.04.2020 to 2 in 29.04.2020.

28.04.2020

  • Number of deaths decreased in Altayskiy kray from 2 in 26.04.2020 to 1 in 27.04.2020

25.04.2020

  • Shift towards future increased. File named '04-24-2020' contained data from 04-25-2020,06:30 (approximately)
  • Update, this was a planned shift, ref. CSSEGISandData#2369

24.04.2020

  • Noticed that upstream published today's measurements with unusually late timestamp. File '04-23-2020' contains stamps of 04-24-2020 03:41

21.04.2020

  • Added moving average plots

17.04.2020

  • Split the plot into top10 and 10-20 plots for readability.

14.04.2020

13.04.2020

12.04.2020

  • No updates on Komi republic (3rd place among Russian regions) since 10.04.2020. Checked both Yandex and Rospotrebnadzor site.

08.04.2020

  • More breaking changes from the upstream. The following daily data files have unmatching data foramt and extra symbols in the line ends:
    • 03-21-2020.csv
    • 03-29-2020.csv
    • 03-30-2020.csv
    • 04-06-2020.csv
  • Updated issue CSSEGISandData#1523

01.04.2020

  • More errors come from checker script, this time on Crimea:
    Error(file='COVID-19_plus_Russia/csse_covid_19_data/csse_covid_19_daily_reports/03-31-2020.csv',
    text='Confirmed decreased for Republic of Crimea from 20 to 16')
    
    That means that Yandex counters decrease their values. We can't name the reason, probably there were some corrections. One possible reason - splitting the Crimea into Crimea and Sevastopol.

30.03.2020

  • Number of 'recovered' decreased in Sverdlovsk oblast
  • Exact text of an error:
    Error(file='COVID-19_plus_Russia/csse_covid_19_data/csse_covid_19_daily_reports/03-29-2020.csv',
    text='Recovered decreased in Sverdlov oblast from 3 to 1 (oh no!)'),
    

25.03.2020

  • Conflict resolved. 23-22-2020.csv file seemed to be damaged by the upstream admins.
  • CSSEGISandData#1523
  • Implemented Yandex data fetcher

23.03.2020

Upstream format change: now

  • ,,Moscow,Russia,2020-03-24 00:00:00,55.75222,37.61556,262,1,9,,"Moscow, Russia"
  • ,,"Saint Petersburg",Russia,2020-03-22 00:00:00,59.93863,30.31413,16,0,2,,"Saint Petersburg, Russia"

21.03.2020

We augmented CSV files from csse_covid_19_daily_reports folder by adding lines like:

  • Moscow,Russia,2020-03-21T00:00:00,5,0,0,55.75222,37.61556
  • "Saint Petersburg",Russia,2020-03-21T00:00:00,4,0,2,59.93863,30.31413

Original README.md starts here

2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

Visual Dashboard (desktop): https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Visual Dashboard (mobile): http://www.arcgis.com/apps/opsdashboard/index.html#/85320e2ea5424dfaaa75ae62e5c06e61

Please cite our Lancet Article for any use of this data in a publication: An interactive web-based dashboard to track COVID-19 in real time

Provided by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE): https://systems.jhu.edu/

DONATE to the CSSE dashboard team: https://engineering.jhu.edu/covid-19/support-the-csse-covid-19-dashboard-team/

DATA SOURCES: This list includes a complete list of all sources ever used in the data set, since January 21, 2010. Some sources listed here (e.g., WHO, ECDC, US CDC, BNO News) are not currently relied upon as a source of data.

Embed our dashboard into your webpage:

<style>.embed-container {position: relative; padding-bottom: 80%; height: 0; max-width: 100%;} .embed-container iframe, .embed-container object, .embed-container iframe{position: absolute; top: 0; left: 0; width: 100%; height: 100%;} small{position: absolute; z-index: 40; bottom: 0; margin-bottom: -15px;}</style><div class="embed-container"><iframe width="500" height="400" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" title="COVID-19" src="https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6"></iframe></div>

Acknowledgements: We are grateful to the following organizations for supporting our Center’s COVID-19 mapping and modeling efforts: Financial Support: Johns Hopkins University, National Science Foundation (NSF), Bloomberg Philanthropies, Stavros Niarchos Foundation; Resource support: AWS, Slack, Github; Technical support: Johns Hopkins Applied Physics Lab (APL), Esri Living Atlas team

Additional Information about the Visual Dashboard: https://systems.jhu.edu/research/public-health/ncov/

Contact Us:

Terms of Use:

  1. This data set is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.

  2. Attribute the data as the "COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University" or "JHU CSSE COVID-19 Data" for short, and the url: https://github.com/CSSEGISandData/COVID-19.

  3. For publications that use the data, please cite the following publication: "Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1"

About

COVID-19 data from JHU CSSE, updated with details on Russian regions.


Languages

Language:Python 92.7%Language:Shell 4.9%Language:Nix 1.7%Language:Vim Script 0.7%