Beast code in Giters

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Language:JavaNOASSERTION98000

backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.

Language:PythonAGPL-3.027700

api-client

Public client for consuming content from the Media Cloud Online News Archive & Directory.

Language:PythonApache-2.06800

feed_seeker

Find rss, atom, xml, and rdf feeds on webpages

Language:PythonMIT3100

date_guesser

A library to extract a publication date from a web page, along with a measure of the accuracy.

Language:PythonMIT4200

nyt-news-labeler

Tag news stories based on models trained on the NYT corpus.

Language:PythonApache-2.03900

opencorpora

A web-based engine for creating and annotating textual corpora

Language:PHPGPL-2.024100

odie_backend

The admin site and api data source for the Online Discourse Insight Explorer.

Language:Ruby300

corpusbuilder

Corpus Build OCR platform

Language:CSSAGPL-3.0700

lumendatabase

The Lumen Database collects and analyzes legal complaints and requests for removal of online materials.

Language:RubyGPL-2.014300

ultimate-sitemap-parser

Ultimate Website Sitemap Parser

Language:PythonNOASSERTION17500

sentence-splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Language:PythonNOASSERTION22400

test-lists

URL testing lists intended for discovering website censorship

Language:Python43400

internet_monitor

The Internet Monitor is a research project to evaluate, describe, and summarize the means, mechanisms, and extent of Internet content controls and Internet activity around the world.

Language:HTML22000

jg-bernard

jg-bernard's starred repositories

memes_pipeline

imagehash

WebScraping

fb_scrape_public

geostring

TSM

unspooler

news_extract

RMallet

Mallet