nychealth / coronavirus-data

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC), from the NYC Department of Health and Mental Hygiene.

Home Page:https://www1.nyc.gov/site/doh/covid/covid-19-data.page

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification of reporting standards - JHU COVID-19 Dashboard

jeremydratcliff opened this issue · comments

Greetings,

My name is Jeremy Ratcliff and I am a member of the team managing the JHU COVID-19 Dashboard. I have corresponded with the office directly and was advised to post an issue here as an alternative. This issue is principally motivated by the three days of no reporting from this department March 21-23 due to technical issues.

The total cases reported on the nyc.gov website, the summary page on this GitHub repo, and the sums of columns B and C on the cases-by-day.csv all presently report the same number (currently 911,332). We are having a bit of trouble interpreting the data included in the cases-by-day.csv and understanding the implications of its management on the other data files.

The README suggests that all of the cases included in the cases-by-day.csv are organized by date of event rather than date of report, and contains no data for the past three days. This raises a few questions for us:

  • Does that mean all NYC data is being reported by date of event with the 3-day lag?
    - ​I​f this is true, is it accurate then that no cases diagnosed from April 19-21 have been publicly reported in this moment?
  • Does the department have 'ground truth' for what should have been the cases reported on March 21, 22, and 23?
    - ​We would hope to correct our time series to remove this spike but it does not appear we will be able to use cases-per-day.csv for this purpose due to its frequent revisions.​​​

Please let me know if this request requires further detail/clarification. Thank you very much in advance!

Hi, thanks for the questions.

Your interpretation of the reporting lag is correct. With the exception of a few tables in the repository, most data are updated daily at a 3-day lag. All data are shown by event date (i.e., date of diagnosis, hospitalization, or death). As such, if reviewing data posted on April 22, there will be no data for events that occurred on April 20-22. These data lags are to account for standard delays in receiving reports of cases, hospitalizations, deaths. More explanation of reporting lags and date of event are available in the Readme technical notes.

Though we know that aggregators like JHU use date of report for consistency across many jurisdictions, we recommend against taking the difference between a cumulative file and its previous version, and interpreting that as trend data - that reflects when we learned about things, but not when they actually happened. Our approach of using report date helps address unexpected delays in transmission of surveillance data, such as the technical issues that were reported in late March. Because we aggregate and analyze data by date of event (and not date of report), we do not have a formal estimate for the number of the cases that would have been expected to be reported during March 21-23, but were not due to the data transmission issues. Instead, we backfill and report cases actually diagnosed on those days.

We hope this helps!