jg-bernard's starred repositories

memes_pipeline

Memes Processing Pipeline that enables the track of memes across multiple Web communities.

Language:PythonStargazers:55Issues:0Issues:0

imagehash

A Python Perceptual Image Hashing Module

Language:PythonLicense:BSD-2-ClauseStargazers:3093Issues:0Issues:0

WebScraping

SSRMC lectures in Web Scraping, HT 2014

Language:RStargazers:24Issues:0Issues:0

fb_scrape_public

Scrapes posts and comments from public Facebook pages.

Language:PythonLicense:BSD-3-ClauseStargazers:106Issues:0Issues:0

geostring

From free-form text to standardized geographical info.

Language:PythonLicense:BSD-3-ClauseStargazers:9Issues:0Issues:0

TSM

TSM - Twitter Subgraph Manipulator

Language:PythonLicense:NOASSERTIONStargazers:83Issues:0Issues:0

unspooler

Research-grade URL expansion for Python.

Language:PythonLicense:NOASSERTIONStargazers:25Issues:0Issues:0

news_extract

Python module to extract articles from NexisUni and Factiva.

Language:PythonLicense:BSD-3-ClauseStargazers:36Issues:0Issues:0

RMallet

R package wrapping Mallet

Language:RStargazers:37Issues:0Issues:0

Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Language:JavaLicense:NOASSERTIONStargazers:980Issues:0Issues:0

backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.

Language:PythonLicense:AGPL-3.0Stargazers:277Issues:0Issues:0

api-client

Public client for consuming content from the Media Cloud Online News Archive & Directory.

Language:PythonLicense:Apache-2.0Stargazers:68Issues:0Issues:0

feed_seeker

Find rss, atom, xml, and rdf feeds on webpages

Language:PythonLicense:MITStargazers:31Issues:0Issues:0

date_guesser

A library to extract a publication date from a web page, along with a measure of the accuracy.

Language:PythonLicense:MITStargazers:42Issues:0Issues:0

nyt-news-labeler

Tag news stories based on models trained on the NYT corpus.

Language:PythonLicense:Apache-2.0Stargazers:39Issues:0Issues:0

opencorpora

A web-based engine for creating and annotating textual corpora

Language:PHPLicense:GPL-2.0Stargazers:241Issues:0Issues:0

odie_backend

The admin site and api data source for the Online Discourse Insight Explorer.

Language:RubyStargazers:3Issues:0Issues:0

corpusbuilder

Corpus Build OCR platform

Language:CSSLicense:AGPL-3.0Stargazers:7Issues:0Issues:0

lumendatabase

The Lumen Database collects and analyzes legal complaints and requests for removal of online materials.

Language:RubyLicense:GPL-2.0Stargazers:143Issues:0Issues:0

ultimate-sitemap-parser

Ultimate Website Sitemap Parser

Language:PythonLicense:NOASSERTIONStargazers:175Issues:0Issues:0

sentence-splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Language:PythonLicense:NOASSERTIONStargazers:224Issues:0Issues:0

test-lists

URL testing lists intended for discovering website censorship

Language:PythonStargazers:434Issues:0Issues:0

internet_monitor

The Internet Monitor is a research project to evaluate, describe, and summarize the means, mechanisms, and extent of Internet content controls and Internet activity around the world.

Language:HTMLStargazers:220Issues:0Issues:0