appeler / dime_race

Using ethnicolr to predict DIME

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DIME Race

Impute race people in Database on Ideology, Money in Politics, and Elections.

Steps:

  1. Subset: We first subset the database to keep: a. uniqueid (We use one that gives a unique id for each contribution.) b. columns related to name, and c. year in which the contribution was made.

  2. De-duplicate: We build a primary key, where key = concatenation of name + year of contribution. We then de-duplicate based on the key. The final dataset has all the columns from step 1 (the primary key column goes away), just fewer rows. (We are eliminating multiple contributions per year from the 'same' person. Our ability to detect the 'same' person is limited by spelling errors, etc. etc.)

  3. Predict: We use https://github.com/appeler/ethnicolr to impute race. The package exposes multiple functions. We use the following functions with following arguments:

a. census_ln (takes last name) for 2010 and 2000 
b. pred_fl_reg_ln and pred_fl_reg_name

We then export out the file.

Script

Data

Data will be posted at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/M5K7VR

About

Using ethnicolr to predict DIME


Languages

Language:Jupyter Notebook 98.2%Language:Python 1.8%