cenpy-devs / cenpy

Explore and download data from Census APIs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

explore pygris migration/integration

ljwolf opened this issue · comments

@walkerke's pygris package was just released. Something that I was quite concerned before about in Python is the fragmentation of effort across many different census data packages, so I think that we need to explore consolidating work around a few better-supported packages.

So, I'm hoping to explore pygris to see how/whether we can offer migration and/or integration examples of the two packages.

Hi @ljwolf! Thanks for the ping and checking out pygris. It would be great to brainstorm ways to collaborate. A few thoughts:

  • Regarding a migration case, I see value in pygris as a standalone geometry-focused package independent of Census data-focused packages. This is the space tigris occupies on the R side, and a lot of people use tigris in non Census data projects. My original motivation for pygris (aside from wanting to learn Python development) was Rafael's work on parallel R/Py implementations in geobr.
  • I do have a couple data helpers in the package (Census APIs and LODES, I've mulled over BLS as well) for users who want some data to merge to their shapes without leaving the package. I don't envision these as full-featured however and I recommend cenpy in the package documentation (and the function docstring itself) for users who want a full-featured, Pythonic interface to the Census APIs.

Regarding integration, I think there could be some interesting directions to look at. One potential area is pygris as an optional geometry engine for cenpy, similar to how I have it set up in tidycensus. As pygris uses the shapefiles from the FTP server instead of the TigerWeb API, it has a few nice features:

  • You can get the cartographic boundary files as well as the TIGER/Line files, which I've found are typically what users want for mapping in coastal regions. As far as I can tell, these files aren't on the API;
  • You can cache downloaded shapefiles which is quite helpful when working with large datasets (like Census blocks);
  • To that point, you could avoid errors associated with large data pulls from TigerWeb (like #126, #150)

It might also be interesting to explore some updates to the detailed usage vignette that show how the packages could work together. For example, censusapi and tigris work quite well together on the R side.