nychealth / coronavirus-data

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC), from the NYC Department of Health and Mental Hygiene.

Home Page:https://www1.nyc.gov/site/doh/covid/covid-19-data.page

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Please help connect May and September antibody data; there appears to be an error

laurarpg opened this issue · comments

I see here an updated data set for antibody positive tests: https://github.com/nychealth/coronavirus-data/blob/master/totals/antibody-by-group.csv posted in May, and the prior one posted in September is here: https://github.com/nychealth/coronavirus-data/blob/master/totals/antibody-by-age.csv

These don't square though - if you take the May data set and compare to September, for ages 0-4, you see that:

Number of additional tests from September to May = 45196-43601 = 1595
Number of additional positives from September to May = 14431 - 12834 = 1597

This means that every single antibody test done from September to May resulted in a positive test (plus a couple of bonus positive tests). I can't imagine this is accurate?

I did a similar comparison of the data for 25-34 year olds:

Number of additional tests from September to May = 591638 - 565318 = 26320
Number of additional positives from September to May = 194851 - 162340 = 32511

In this case, for this age range there are substantially more new positive antibody tests than new people tested.

Please can you help me understand what's going on with this data?

Thanks for the question.

The answer lies in the fact that, per documentation, the data are person-level. That means that the data are not the number of tests, but the number of people tested.

Since many people get tested more than once, over time, it's possible to add more people positive without adding more people tested: somebody who initially tested negative, but later seroconverted and got tested again would appear in the data as a new "person testing positive" but not as a new "person tested."

These data can't help approximate the percent of New Yorkers currently seropositive. They're too heavily influenced by testing bias. There's no reason to assume that those who have chosen to get tests is a representative sample of all New Yorkers, and cumulative data on nearly 2 years of antibody testing can offer very limited insight into who is currently seropositive.