cfpb / hmda-census

ETL for geographic and Census data used by the HMDA Platform

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HMDA Census Geographic and Demographic Data

Table of Contents

Repository Purpose

  • Provide an ETL for geographic and Census data used by the HMDA Platform
  • Check to ensure the accuracy of Census data in the HMDA Platform

Requirements and Setup and Running the Code

Install Requirements The code is built in Python3.X which can be found at the link below. The following packages are also required and can be installed using the commands listed.

  • Python 3.6 or greater
  • set up a virtual environment if desired: virtualenv venv
    • turn on virtual environment: source venv/bin/activate
    • turn off virtual environment: deactivate
  • install requirements packages: pip install -r requirements.txt
  • Note: to load data files to a database, you must have one installed locally. This code has been tested with PostgreSQL

Creating Yearly Census File for the HMDA Platform

  1. Update the python/census_config.yaml to include the relevant years census file in msa_md_delinations section.
  2. Update the year variable in the python/create_ffiec_census_file.py file to be the year for which you want to generate the platform census file.
  3. Run the python/create_ffiec_census_file.py file.
  4. The file will be created in output/ as ffiec_census_msamd_names_<year>.txt
  5. Move the file in the HMDA-Platform repo as common/src/main/resources/ffiec_census_<year>.txt

Working With the Scripts

Configuration: Determines which years of data to use, allows selection of fields in both data files, and contains data specifications and URLs relevant.

The configuration is used in the census_functions.py class. The test.py script contains examples that use the class to download, cut, merge, and load to database the resulting census data.

Current issues:

  • MSA to tract mapping verification needs to be updated for the new codebase
  • MSA delineation files pre-2000 are in a different format that is yet to be parsed

Sources of Data

The HMDA Platform uses data the combines elements of the FFIEC Census Flat File and the OMB MSA delineation files. The FFIEC Census file contains over 1,000 data elements, of which the HMDA Platform uses a small subset. The OMB MSA bulletines are primarily used for names.

The Office of Management and Budget produces MSA data. Updates can include changes to an MSA's boundaries or creation of new MSAs. These data have no regular publication cycle. HMDA Operations uses the MSA definitions in effect on 12/31 of the year preceding collection, this aligns with other Regulation C criteria.

The Census delineation files are used to map names to MSA/MD geographies.

The FFIEC produces an annual Census Flat File containing demographic data and a mapping of MSA data to Census tract.

Additional Census data is available, but not used in this project: The Census reference files contain MSA/MD, micropolitan statistical area definitions, names, and maps to county and tract codes.

Uses of Data

The HMDA Platform uses data during data submission and publication.

During submission Census data are used to verify the relationship between reported geographic identifiers for loans and applications.

In publication the Census demographic and geographic data are used to add demographic information to LAR datasets. The variables added include:

  • Total Population
  • Minority Population Percentage
  • FFIEC Median Family Income
  • Tract to MSA/MD Income Percentage
  • Number of Owner Occupied Units
  • Number of 1 to 4 Family Units
  • MSA (new in 2018, was previously submitted by FIs)

Census geographic data are used to map MSAs to county and tract areas in the Aggregate and Disclosure reports and for geographic lookup features in HMDA data tools web interfaces.

See here for the HMDA-Platform logic mapping Census to LAR data.

HMDA Publication Products

  • Aggregate Reports: contain MSA level data on application and lending activity for all institutions reporting HMDA data.
  • Disclosure Reports: contain MSA level data on application and lending activity for a single institution.
  • LAR snapshot publication: contains the entire dataset of loans and applications submitted in accordance with Regulation C.

HMDA Platform Census Files

About

ETL for geographic and Census data used by the HMDA Platform


Languages

Language:Jupyter Notebook 83.4%Language:Python 16.6%