saffsd's repositories
kaggle-stackoverflow2012
My entry to the Kaggle 2012 Stack Overflow competition. Ranked 10th on the final public leaderboard.
geniatagger
- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text -
kaggle-stumbleupon2013
My entry to the Kaggle 2013 StumbleUpon competition. Ranked 4th on the final private leaderboard.
linguini.py
linguini.py is a pure-Python implementation of linguini, a vector-space model language identifier with support for bilingual and trilingual documents.
assignmentprint
Pretty printer for student-submitted assignments. Helps with prettyprinting student code and generating reports.
forum_features
Data model for manipulating forum data.
language_data
Pythonic interface to natural language metadata
alta2012-langidforlm
Code to build corpora from ClueWeb09
alta2012-sharedtask
Full reference implementation of the entry that won the ALTA2012 Shared Task.
alta2012-usim
Supporting materials for ALTA2012 publication "Unsupervised Estimation of Word Usage Similarity"
LibSVMsharp
C# wrapper of LibSVM
python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!