Sonification of Income Inequality on the NYC Subway

This is a set of scripts that generate songs based on median household income data along different subway trains in New York City. This is extended from a song I produced as the Data-Driven DJ in 2015. For more information about how this song was created, visit the project page on the Data-Driven DJ website.

This codebase produces music in the same way as the Data-Driven DJ project referenced above, but improves the process of generating new songs based on new data (in this case, 2017 American Community Survey (ACS) census data) and adds support for generating songs for any subway line.

Data sources

MTA subway station data via MTA developers page
MTA subway line colors
2010 New York City Census Tracts
Median household income by census tract, 2017 ACS 5-year estimates (B19013). Obtained using the following options:
- Geographies > Geographic type > Census Tract - 140
- New York -> All Census Tracts within New York
- Topics > Income/Earnings (Households)
- ID: B19013 / MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS) / 2017 ACS 5-year estimates

I generated a simple visualization that combines the Census tract data with income data.

Requirements for generating music and visualization

Python 3 (developed using Python 3.6, but likely 3.5+ should work)
Numpy
Pydub - For audio manipulation

Only required for visualization

The song comes with a basic visualization showing where you are in New York City at any given time in the song. This step requires a few more libraries:

Pillow - For image generation
Gizeh - For vector graphics. Requires Cairo to be installed
FFmpeg - For encoding the video file

Only required for preprocess step

This is only necessary if you're attempting to generate songs based on different or new data (i.e. not the 2017 data in the repository)

Shapely - For geometric calculations (only required for preprocess.py step)

Preprocessing new data

This repository already contains preprocessed data from the 2017 American Community Survey (ACS). If you have a different dataset obtained from the Census, you can do the following to preprocess the data. Otherwise, you can skip this step.

python preprocess.py -census "data/YOUR_DATA_FILE.csv"

This script does the following:

Reads median household income data via the Census broken up by census tract
Reads 2010 NYC Census tract data and determines lat/lon coordinates for each tract
Reads MTA subway station data which contains the lat/lon for each station
Matches each subway station to the two closest census tracts
Takes the weighted mean of the median household income from the two tracts, weighted by distance from the station. This is to account for a station that may be at the edge of two tracts or two stations that exist in the same tract.

This will generate a .csv file for each of the subway lines in the folder data/lines/{LINE SYMBOL}.csv that contains a column income that represents the median household income of the station's surrounding area (census tracts.)

These files have already been processed for 2017 data here.

Generating music and visualization

The following script generates both the audio and visuals for a single subway line and compiles it into a video. At the least you need to indicate a subway line's .csv file (-data) and an image that represent's the subway's bullet symbol (-img).

python make.py -data "data/lines/7.csv" -img "img/7.png"

If you just want the audio, you can run:

python make.py -data "data/lines/7.csv" -ao

Sometimes if you are creating a song using an express train, you might want to include the local stops as implicit data points between express stops. In this case, you will not see labels for the local stops, but they will be used to add more nuance between express stops which can span a long distance. Here's a command where you are creating a song based on the 2 train with local 1 train stops between express stops:

python3 make.py -data "data/lines/2.csv" -loc "data/lines/1.csv"

A large number of options are available for tweaking the end result. You can find their descriptions by running

python make.py -h

Conversion to .webm format

With target 400Kb bitrate:

ffmpeg -i subway_line_7_loop.mp4 -c:v libvpx-vp9 -b:v 400K -pass 1 -an -f webm /dev/null && \
ffmpeg -i subway_line_7_loop.mp4 -c:v libvpx-vp9 -b:v 400K -pass 2 -c:a libopus subway_line_7_loop.webm

For Windows:

ffmpeg -i subway_line_7_loop.mp4 -c:v libvpx-vp9 -b:v 400K -pass 1 -an -f webm NUL && ^
ffmpeg -i subway_line_7_loop.mp4 -c:v libvpx-vp9 -b:v 400K -pass 2 -c:a libopus subway_line_7_loop.webm

beefoo / subway-inequality