housing-data-coalition / oca

ETL process for deidentified NYC Housing Court Filings data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NYC Housing Court Filings

The OCA Data Collective regularly receives housing court filings data from the New York State Office of Court Administration (OCA). In this repository we manage the Extract-Transform-Load process for getting raw XML filings data from OCA via SFTP, parsing the nested XML data into a set of tables, and making those CSV files publicly available for download. These data are also now publicly available in XML format on the court system's website.

To work with these data you can use the NYCDB to automatically load all of the tables into a PostgreSQL database for analysis. You can also find documentation about the data, including a data dictionary on the NYCDB wiki.

The OCA Data Collective includes the Right to Counsel Coalition, BetaNYC, the Association for Neighborhood and Housing Development, the University Neighborhood Housing Program, and JustFix. It is also affiliated with the Housing Data Coalition (HDC).

Attribution

When utilizing this work, please use one of the following attributions and links:

Data from the New York State Office of Court Administration via the OCA Data Collective in collaboration with the Right to Counsel Coalition.

Data from the New York State Office of Court Administration via the OCA Data Collective. This data has been obtained and made available through the collaborative efforts of the Right to Counsel Coalition, BetaNYC, the Association for Neighborhood and Housing Development, the University Neighborhood Housing Program, and JustFix.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License

CSV Files

Date Last Updated

About the data

The data we receive from OCA is an extract of all landlord and tenant cases in NYC housing court, without personally identifying information. For more details about the raw data and the final parsed tables, see /docs.

About the code

For information about the details of various components, see /lib

Setup

First, you will only be able to run this yourself if you have HDC's credentials to access to the SFTP to get the raw data transfered from OCA and access to the private AWS S3 where those files are stored.

You will need Docker.

First, you'll want to create an .env file by copying the example one:

cp .env.example .env     # Or 'copy .env.example .env' on Windows

Take a look at the .env file and fill in the AWS S3 credentials.

To run the whole process in the docker container run:

docker-compose run app

About

ETL process for deidentified NYC Housing Court Filings data

License:Other


Languages

Language:Python 97.2%Language:Dockerfile 1.6%Language:Shell 1.2%