Geography merge creates duplicate rows for certain geo_types
dylan-knaggs opened this issue · comments
The following query results in duplicate rows due to the geography merge (duplication doesn't occur when join_geography is False):
zcta = ['94964','94929','94933']
zcta_final = ['zip code tabulation area:'+i for i in zcta]
zip_query = Query(
estimate=5,
years=[2018],
variables='DP03_0064E',
for_geo=zcta_final
)
zip_dataframe = zip_query.run()
There is one duplicate per geometry, so the above yields 9 rows rather than 3. This issue also occurs with places and perhaps other geography types. From some very light investigating its not clear to me what causes this to occur, though my guess would be duplicates in the shapefile that persist from https://github.com/socrata/autocensus/blob/62f0a0f01e119282e987ede630eccf9f1c088762/autocensus/query.py#L383.
@cmsetzer, I'd be happy to take on investigating this if you don't have any thoughts off the top of your head. I may not get to it for a few weeks but don't think its overly time-sensitive.