SOTorrent's repositories
db-scripts
SQL and Bash scripts to import the offical Stack Overflow data dump and the SOTorrent data set, to retrieve Stack Overflow references from the BigQuery GitHub data set, and to retrieve data from the SOTorrent dataset for analysis.
posthistory-extractor
Extracts the version history of text and code blocks from the official Stack Overflow data dump.
string-similarity
Implementation of various string similarity metrics.
metric-evaluation
Comparision of different string similarity metrics for reconstructing the history Stack Overflow posts.
so-edit-viz
Visualization of edit and comment events in Stack Overflow threads.
posthistory-comparator-gt-cs
Comparator app to validate connections of ground truth and computed similarity.
posthistory-gt
Tool to create manually validated Stack Overflow post histories.
posttags-extractor
Extract tags from posts in Stack Overflow data dumps.
postview-extractor
Extract viewcount of threads from Stack Overflow data dumps.
preprocessing-pipeline
Preprocessing pipeline to extract and normalize text/code blocks from Stack Exchange forum posts and comments.
so-internal-refs
Scripts used to import and analyze internal web server logs provided by Stack Overflow under an NDA.