saffsd's repositories

langid.py

Stand-alone language identification system

Language:PythonLicense:NOASSERTIONStargazers:2278Issues:65Issues:72

kaggle-stackoverflow2012

My entry to the Kaggle 2012 Stack Overflow competition. Ranked 10th on the final public leaderboard.

Language:PythonStargazers:46Issues:2Issues:0

wikidump

Tools to manipulate and extract data from wikipedia dumps

Language:PythonLicense:GPL-3.0Stargazers:43Issues:6Issues:4

polyglot

Polyglot is a language identifier for detecting text documents containing text written in more than one language, and for identifying the languages therein.

Language:PythonLicense:NOASSERTIONStargazers:32Issues:5Issues:1

langid.c

Pure C natural language identifier with support for 97 languages

Language:CLicense:NOASSERTIONStargazers:24Issues:4Issues:3

geniatagger

- part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text -

Language:C++License:NOASSERTIONStargazers:22Issues:4Issues:0

kaggle-stumbleupon2013

My entry to the Kaggle 2013 StumbleUpon competition. Ranked 4th on the final private leaderboard.

langid.js

An off-the-shelf client-side language identification module for JavaScript.

Language:JavaScriptLicense:NOASSERTIONStargazers:14Issues:4Issues:1

imgevolve

Evolve images from sets of triangles.

Language:PythonStargazers:7Issues:3Issues:0

updatedir

Rsync-like directory updating over multiple protocols

Language:PythonLicense:GPL-3.0Stargazers:3Issues:2Issues:0

daifugo

Simulation system for the japanese card game Daifugo.

Language:PythonStargazers:2Issues:2Issues:0

ldig

Language Detection with Infinity-gram

Language:C++Stargazers:2Issues:3Issues:0

linguini.py

linguini.py is a pure-Python implementation of linguini, a vector-space model language identifier with support for bilingual and trilingual documents.

Language:PythonLicense:NOASSERTIONStargazers:2Issues:2Issues:0

assignmentprint

Pretty printer for student-submitted assignments. Helps with prettyprinting student code and generating reports.

Language:PythonLicense:GPL-3.0Stargazers:1Issues:0Issues:0

forum_features

Data model for manipulating forum data.

Language:PythonStargazers:1Issues:2Issues:0

language_data

Pythonic interface to natural language metadata

Language:ASPStargazers:1Issues:2Issues:0

alta2012-langidforlm

Code to build corpora from ClueWeb09

Language:PythonLicense:GPL-3.0Stargazers:0Issues:2Issues:0

alta2012-sharedtask

Full reference implementation of the entry that won the ALTA2012 Shared Task.

Language:PythonStargazers:0Issues:2Issues:0

alta2012-usim

Supporting materials for ALTA2012 publication "Unsupervised Estimation of Word Usage Similarity"

Language:PythonStargazers:0Issues:0Issues:0

LibSVMsharp

C# wrapper of LibSVM

Language:C#License:MITStargazers:0Issues:2Issues:0

piboso

Sentence tagger for biomedical abstracts.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:2Issues:0

python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Language:PythonStargazers:0Issues:2Issues:0