nychealth / coronavirus-data

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC), from the NYC Department of Health and Mental Hygiene.

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification ZCTA/MODZCTA

mariepastora opened this issue · comments


I'd just like to get some clarification on MODZCTA vs ZCTA.

  • As far as I understand, the vaccine data per ZIP Code dashboard is reporting numbers by MODZCTA. It's specified in the subhead of the chart: "This map shows the percent of NYC adult residents partially and fully vaccinated by modified ZIP Code Tabulation Area." However, upon downloading the data, the head of a column says "ZCTA num". Is it normal? It's a bit confusing, so just wanting to double check that it's actually per MODZCTA, and not per ZCTA.
  • Is it the same for total case/death count reported by ZIP ( I don't see a similar note. Are these MODZCTA, or ZCTA? Upon downloading, the header of the column says MODZCTA, but just wanting to double check, too.
  • Finally, regarding ZCTA to MODZCTA mapping: If one MODZCTA can contain several ZCTA, I'm wondering if the opposite is true. Can total pop / demographics in a MODZCTA be obtained by summing up the pop/demographics in all the ZCTA that compose it? If I take the latest census data per ZCTA for NYC from the U.S. Census Bureau, is it correct to sum up all the demographics of all the ZCTA within one same MODZCTA?


@mariepastora you are correct, if there's ~178 or so, its usually Modified ZIP Code Tabulation Area (MODZCTA). I've been working w/ the Vaccine data and confirm it. I am assuming your third point is correct.

Thanks @nygeog that's so helpful!
Yeah I am a bit worried about the third one, because the pop estimates per MODZCTA that are in the vaccination file of NYC Health are quite different from the ones I'm getting when summing up 2015-2019 ACS total pop estimates for all ZCTAs in the same MODZCTA.
And sure, it might just be that NYC Health is using a different survey, but not sure it explains all the discrepancies.
To illustrate, left: NYC Health estimates, right: based on ACS 2015-2019.
Screen Shot 2021-02-19 at 12 14 10 AM
Not sure what is going on here.

I see, thanks! So the estimates from NYC Health are based on PEPANNRES? It's just that it still seems like a pretty wild variation for some, and not at all for others.
11691, for example: 48,414 vs 68,543.
But 10001: 25,537 vs 24,117.
So just wondering if I should re-think this approach and if so how aha.

I've been using 2018 5-year ACS data (as I started collection pipeline before the 2019 release, will use when have a chance though) and for 11691 I'm also seeing similar large difference.

Screen Shot 2021-02-19 at 7 18 00 AM

@mariepastora are you seeing similar differences? See pop_dif column, far right. These are fairly large differences.

Screen Shot 2021-02-19 at 7 17 46 AM

@igorgeyn do these differences match your experience w/ these population differences?

@igorgeyn DOH data is showing NYC population to be - 6,632,698.0. I generally think of NYC as of having at least 8 million population.

Screen Shot 2021-02-19 at 9 49 20 AM

@igorgeyn is your POPULATION_ESTIMATE only the population 18+?

Yes I do get similar differences @nygeog, which is pretty massive... from what I can see DOH estimates is only for adults over 18. It still seems like a lot? Thanks for showing your numbers!

(POPULATION_ESTIMATE is DOH, other col is ACS 5Y 2015-2019)
Screen Shot 2021-02-19 at 3 48 12 PM

Yeah, it looks like they are using adult population:

Screen Shot 2021-02-19 at 10 49 54 AM

I don't know what makes the most sense to report on for denominator. As under 18 can still spread, but they are not eligible for vaccinations.