cmsetzer / autocensus

Python package for collecting ACS and geospatial data from the Census API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Geography merge creates duplicate rows for certain geo_types

dylan-knaggs opened this issue · comments

The following query results in duplicate rows due to the geography merge (duplication doesn't occur when join_geography is False):

zcta = ['94964','94929','94933']
zcta_final = ['zip code tabulation area:'+i for i in zcta]
zip_query = Query(
    estimate=5,
    years=[2018],
    variables='DP03_0064E',
    for_geo=zcta_final
)
zip_dataframe = zip_query.run()

There is one duplicate per geometry, so the above yields 9 rows rather than 3. This issue also occurs with places and perhaps other geography types. From some very light investigating its not clear to me what causes this to occur, though my guess would be duplicates in the shapefile that persist from https://github.com/socrata/autocensus/blob/62f0a0f01e119282e987ede630eccf9f1c088762/autocensus/query.py#L383.

@cmsetzer, I'd be happy to take on investigating this if you don't have any thoughts off the top of your head. I may not get to it for a few weeks but don't think its overly time-sensitive.

I belive we've resolved this in #15. Thanks again for catching it!