awebson / Political-Vector-Projector

Unsupervised learning of political ideology by word vector projections.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Political Vector Projector

Given word vectors trained by word2vec (Mikolov et al. 2013) or fastText (Bojanowski et al. 2016), this program projects the vectors of U.S. senators onto a "conservative" to "liberal" axis. The scalar components of such projections may be interpreted as a valid metric of political ideology.

Learn more about this project at here. See this iPython Notebook for complete experiment results.

Highlight

Plotting the vector projected ideology against DW-NOMINATE, an ideology metric widely used in political science, reveals a strong correlation: alt text

Training Corpus Avg. Pearson’s r Avg. Spearman’s 𝝆
NYT 1981 - 2016 0.7559 0.7602
Wash Post 1977 - 2007 0.7902 0.8003
WSJ 1997 - 2017 0.7205 0.7184

In addition to members of Congress, you can also project vectors of public policies. These results are quite amusing but still highly experimental. Again, for a detailed account, please refer to here. alt text

Requirements

fastText or word2vec

Gensim (optional, only needed for the experimental feature of projecting public policies.)

The DW-NOMINATE ideology data is available at voteview.com. Some example data is already included in this repo.

I apologize that I have tested the code only with Python 3.6

How-To

There are two methods for loading vectors into PoliVec Projector:

First Method: Use the Word2VecProjector class to read vector files generated by word2vec. Call evaluate_ideology_projection() to evaluate a single congressional session of ideology data. Call multiyear_evaluation() and pass an iterator, e.g. multiyear_evaluation(cgrs_sess=range(97,115)), to evaluate multiple years of data. The iPython notebook includes several examples that will help you get started.

Second Method: (seemingly more complicated, but more efficient for comparing multiple years of data) Provide a plain text list of words you want to query, along with the axes onto which you want to project. The axes follow the order of: [positive x axis, negative x axis, positive y axis, negative y axis]. An example queries.txt looks like this:

conservative
liberal
good
bad
johnson
nixon
carter
reagan
etc.

If you are interested in members of Congress, the gen_queries.py script in this repo can take care of this step for you. The name lists of the 95th - 114th Senate (1977 - 2017) are also already included in this repo at queries/

Then, run gen_vectors.sh, which takes multiple lists of queries and feed them to fastText's print-word-vectors function. Be sure to revise the directories specified in the shell script so that it loads your own pre-trained fastText models. The vectors of members of the 95th to 114th Senate are also already included in this repo at queried_vectors/

Lastly, create a FastTextProjector object to load the queried vectors, then call evaluate_ideology_projection() or multiyear_evaluation().

(In principal, you can use PoliVec Projector with any word embedding models, so long as you make a subclass tweak the file IO methods to load your vectors properly.)

License

MIT

About

Unsupervised learning of political ideology by word vector projections.


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%Language:Shell 0.0%