nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.

Home Page:https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Issue: cases_avg has incorrect number for USA on 2022-01-27

tushartify opened this issue · comments

The cases_avg count of 589224 for 2022-01-27 appears to be incorrect on the rolling_avg feed. For new_daily_cases, the accurate 7 days average is 586948. ref. image

As per the Rolling Averages definition, cases_avg is the average number of new cases reported over the most recent seven days of data. In other words, the seven-day trailing average.

Please check as this seems to be a calculation issue.

Hi @tushartify,

If you read onwards in that definition section there is a further explanation of how the rolling averages are adjusted to account for anomalies and missing data in the data.

Because many agencies do not report data every day, variation in the schedule on which cases or deaths are reported, such as around holidays, can cause irregular patterns in a simple seven-day trailing average.

To adjust for this in our averages, the number of days included in the average may be extended if there are days within the time range with no data reported. The average is extended to older days until at least seven days of data are included.

If the most recent days have no data reported, then the average is extended further back until seven days worth of data are included. Data reported on a day that follows one or more days with no data reported is assumed to represent multiple days worth of data. In any average, that day and all non-reporting days preceding it are always included together in the average. This may cause some averages to include more than seven days.

For the U.S. national case and death count averages, the average is the sum of the average number of cases and deaths in all states and territories each day. This average may not match the average when calculated from the U.S. case and death total in order to account for irregularly timed case and death reports at the state level.

See the methodology section for a more detailed discussion of how single-day reporting anomalies affect the average.