nychealth / coronavirus-data

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC), from the NYC Department of Health and Mental Hygiene.

Home Page:https://www1.nyc.gov/site/doh/covid/covid-19-data.page

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

calculation error in NYC 28-day-average daily percent positive ?

hasedgwick opened this issue · comments

Hello. On https://www1.nyc.gov/site/doh/covid/covid-19-data.page, the Summary, Percent Positive, Last 28 days Daily average is way too high (today, 4/18, it is listed as 4.49%). Please check the calculation formula. These values have been looking too high for some time. I have not located where this 28-day average is tabulated, but I am looking at test.cvs for the daily percent positive and for the 7-day-average daily percent positive; the 28-day-average daily percent positive on the COVID-19 data page is too high to be consistent with these values. Moreover, the percent positive trend is listed as Stable today, but the test.cvs 7-day-average clearly shows that it is still increasing steadily. Thank you for taking a look at this.

Hi, thanks for your question. There's no calculation error. The 28-day value for people positive may differ from hand calculations from daily data because it is not an average of percent positive values over that 28-day period. As with other percent positivity data, it's de-duplicated by person for the time period in question.

The Increasing/Decreasing flags are triggered if the 7-day value is more than 10% above or below the 28-day value. This offers a very high level comparison of the last week of data compared to the last month of data. Slow but steady increases (as we're seeing with percent positivity) might not exceed this threshold.

Hope this helps clear up any confusion.

These are calculated from our internal systems from protected person-level data. The difference in values is rooted in the fact that in tests.csv, a person may be counted once each day ("de-duplicated by day") whereas the 28-day value de-duplicates over that time period, counting a person only once in that 28-day period - since, again, this is "people tested who tested positive" and not just "percent of tests that are positive."