Cammie King's repositories
distributions
Low-level primitives for collapsed Gibbs sampling in python and C++
ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
go-probably
Probabilistic Data Structures for Go
imbalanced-learn
Python module to perform under sampling and over sampling with various techniques.
online-hdp
Online inference for the Hierarchical Dirichlet Process. Fits hierarchical Dirichlet process topic models to massive data. The algorithm determines the number of topics.
reddit-10-year-data
Data from the last ten years of reddit
sayit-data
data with similar subreddits graph
sklearn-diffmap
A scikit-learn compatible diffusion map implementation.
sortedcounter
A Counter like the collections class but with sorted keys (thanks to sortedcontainer)
sparseutil
A collection of utilities/helpers for working with sparse matrices.
UnsupervisedHypernymy
Data and code for the experiments in: "Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection". Vered Shwartz, Enrico Santus and Dominik Schlechtweg. EACL 2017.