Steph Buongiorno's repositories
stephbuon.github.io
My portfolio.
AdHominem
Authorship Verification in Social Media via Attention-based Similarity Learning
hansard-speakers
A data processing pipeline to disambiguate speakers in the 19th-century British Parliamentary debates.
democracy-lab
Code, manuals, and concepts for Democracy Lab research and affiliate projects.
dhmeasures
"White box" statistical functions for analyzing textual corpora.
posextract
Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.
Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
hpc_docs
HPC Documentation and Examples
homepage
Source code for ropengov.org
noaa
For Accessing Current and Historic Weather Data by the National Oceanic and Atmospheric Administration (NOAA)
posextractr
Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.
hansardr
Access a cleaned version of the c19 Hansard corpus with improved speaker names in the R environment.
rogtemplate
pkgdown template for rOpenGov packages
congressional-data-scraper
Export an analysis-ready version of the Daily Editions of the U.S. Congressional Records.
hansard-shiny
Code for the "Hansard Viewer" web app (a prototype app for applying to future support).
congress-shiny
Code for the "Congress Viewer" web app (a prototype app for applying to future support).
pytorch_active_learning
PyTorch Library for Active Learning to accompany Human-in-the-Loop Machine Learning book
digital-history
Instructional repository for "Text Mining as Historical Method"
twitterscraper
Scrape Twitter for Tweets
text_mining_data_sets
Notebooks for accessing data for text mining tutorials and projects on M2
concept-lab-viewer-march
Latest version of the Shiny app created for the Concept Lab for viewing conceptual networks
think-play-hack
Think-Play-Hack: World Views
get_hansard_data
Script to pull down Hansard data until API works.
box_archive
Scripts to tar, compress, and upload large datasets to Box. The scripts use GNU Tar's multivolume feature to keep each file's size less than 15 GB and Slurm to parallelize uploading the archives.