Python tools for creating and maintaining Parquet files from US 2020 Census Data.
To use the data download shell script files first install wget.
To install the census-parquet package use
pip install census-parquet
This will also install the required Python dependencies which are:
To run the census-parquet code simply use
run_census_parquet
This runs the following scripts in order:
download_boundaries.sh
- This script downloads the Census Boundary data needed to runprocess_boundaries.py
download_population_stats.sh
- This script downloads population stat data needed for process_blocks.pydownload_blocks.sh
- This script downloads the Census Block data needed to run process_blocks.pyprocess_boundaries.py
- This script processes the Census Boundary data and creates parquet files. The parquet files will be output into aboundary_outputs
folder.process_blocks.py
- This script processes Census Block data and creates parquet files. The final combined parquet file will have the nametl_2020_FULL_tabblock20.parquet
.