There are 7 repositories under text-as-data topic.
Beautiful visualizations of how language differs among document types.
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
Interpretable data visualizations for understanding how texts differ at the word level
Notebooks for the Seattle PyData 2017 talk on Scattertext
Summer/ winter schools, workshops and conferences in computational social science 🫂
A tool for Semantic Scaling of Political Text (branch of Topfish, a suite of tools for Political Text Analysis)
Literature 📄 and datasets 📚 on automatic populism detection
Code and models for 3 different tools to measure appeals to 8 discrete emotions in German political text
2018 Computational Text Analysis Notebooks, University of Mannheim
LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn
Summer 2017 Social Media Analytics Workshop Series
An Automation Webcrawler for Extracting Central Bankers' Speeches
'dictvectoR' measures the similarity between a concept dictionary and documents, using fastText word vectors. Implements the "Distributed-Dictionary-Representation" (Garten et al. 2018) method in R.
The ABC of Computational Text Analysis. BA Seminar, Spring 2022, University of Lucerne
A small showcase for topic modeling with the tmtoolkit Python package. I use a corpus of articles from the German online news website Spiegel Online (SPON) to create a topic model for before and during the COVID-19 pandemic.
A tutorial on using regular expressions in R
From using xpdf, rvest, and quanteda on United Nations Digital Library search results to applying dictionaries to speeches in United Nations meeting records
all code and results for my MDS thesis at the hertie school
The ABC of Computational Text Analysis. BA Seminar, Spring 2021, University of Lucerne
Empirical framework applied to parliament discourses and Twitter data, with a Discourse Polarization Index.
PhD Applied empirical economics at Stockholm University
This repository uses text-as-data methods alongside traditional primary source reading to analyze early American state constitutions. The R scripts create a function to scrape and clean the constitutional text, run sentiment analysis, calculate tf-idf, and perform LDA. This is a work-in-progress.
TextClass Benchmark Leaderboards
Original corpus of articles relating to refugees scraped from Tennessee newspaper The Chattanoogan along with simple code for text-as-data word cloud.
Code for collecting and cleaning speeches (text) of the US 2020 election campaign. Corresponding publication: "A text dataset of campaign speeches of the main tickets in the 2020 US presidential election", by Ioannis Chalkiadakis, Louise Anglès d’Auriac, Gareth W. Peters, and Divina Frau-Meigs
Replication script for mining sentiments towards the EU from Parliamentary Speeches in the National Council of the Slovak Republic (1994-2023)
Material from my Machine Learning for the Social Sciences course