Michael Miller Yoder's repositories
fanfiction-nlp
An NLP processing pipeline for characters in fanfiction. Developed by students at Carnegie Mellon University beginning 2019.
hate_speech_identities
Code for CoNLL 2022 paper, "How Hate Speech Varies by Target Identity: A Computational Analysis"
wikipedia-talk-scores
Wikipedia talk page dataset for IJCNLP 2017 paper.
av-survey-topic-modeling
Topic modeling of an AV survey by BikePGH
fanfiction-scripts
Scripts for processing fanfiction for CMU project started 2018
tumblr-scripts
Scripts for processing Tumblr data and running experiments
adl_covid
Code for a project tracing associations between anti-vaccine sentiment and right-wing extremist narratives
AO3Scraper
A Python scraper for getting fan fiction content and metadata from Archive of Our Own.
book-nlp
Modified version of BookNLP from David Bamman https://github.com/dbamman/book-nlp
book-nlp-quote-attribution
Natural language processing pipeline for book-length documents
comp-ethics
Scripts for CMU Computational Ethics for NLP class 2017-2018
convote-scripts
Experiments on ConVote data
fanfiction
Scraping tools for fanfiction.net
fanfiction-nlp-archive
Stores legacy code for FanfictionNLP
fanfiction-nlp-evaluation
Scripts for evaluating fanfiction NLP processing pipeline (fanfiction-nlp)
hate_speech_rhetoric
Project exploring rhetorical types of hate speech
misc-scripts
Miscellaneous scripts
nn4nlp-scripts
Scripts for CMU Neural Networks for NLP course
predicting-book-genre-with-lstm-model
I implement RNN with 2 LSTM layers and 1 embedding layer to predict the genre of a book given its description.
python-corenlp-protobuf
Python bindings for Stanford CoreNLP's protobufs.
SAGE
Sparse Additive Generative Model of Text
scholar.hasfailed.us
Google Scholar is a trans-exclusionary site. Don't use it. Help us demand change.
SemAxis
SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment
socialbias_vaccine
Code for looking at how identities are used in pro-vaccine and anti-vaccine stances in covid discussions (with Lynnette Hui Xian Ng)
storyq_scripts
Scripts for processing informal narrative for education
textClassifier
Text classifier for Hierarchical Attention Networks for Document Classification
tumblr_community_identity
Learning representations for identity labels on Tumblr that highlight axes of similarity and difference that are relevant to communities.
wikipedia-codeswitching-data
Wikipedia talk page code-switching features and editor success scores. Contact yoder@cs.cmu.edu for scripts for dataset construction.