John Schriner's repositories
4chan-a-b-pol
A corpus collection 4chan's /a/ and /b/ from June 2015 and /a/b/pol from July 2019
chan-pol
a script using selenium that expands /pol/ threads, scrapes, and cleans up the text for corpus use
librarycode
some html and js code for the site
NLP
NLP Projects
presentations
Presentations and Works in Progress
RU-Stress-Prediction
using Zaliznjak's dictionary and stresscodes I use FairSeq to predict Russian stress
SFWpy
categorizes and gives images a NSFW evaluation
swapscrape
An automation tool using Selenium and ImageScraper to grab images interactively from a page
TACIT
We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components: 1. Crawling plugins 2. Corpus management 3. Analysis plugins. TACIT's open-source plugin platform allows the architecture to easily adapt with the rapid developments text analysis.
xkcd-substitutions-mozilla
A Firefox extension that replaces ordinary words with much more fun ones.