Media Cloud's repositories
sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
ultimate-sitemap-parser
Ultimate Website Sitemap Parser
cliff-annotator
A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.
api-client
Public client for consuming content from the Media Cloud Online News Archive & Directory.
nyt-news-labeler
Tag news stories based on models trained on the NYT corpus.
api-tutorial-notebooks
A set of jupyter notebooks demonstrating how to use the Media Cloud API.
feed_seeker
Find rss, atom, xml, and rdf feeds on webpages
metadata-lib
How Media Cloud approaches extracting metadata from online news stories
web-search
Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.
cliff-api-client
A Python client for the CLIFF geoparsing tool
rss-fetcher
Intelligently fetch lists of URLs from a large collection of RSS Feeds as part of the Media Cloud Directory.
wayback-news-client
A client library to access the Wayback Machine news archive search.
postgresql-citus-aws-graviton2
PostgreSQL built for AWS Graviton2
word-embeddings-server
Helpful micro-service to return results from word2vec models
cliff-homepage
A simple homepage for the CLIFF project
news-search-api
Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
story-indexer
The core pipeline used to ingest online news stories in the Media Cloud archive.
a-catchall-page
Dokku app that serves a static HTML catch-all page, displayed for bad domains
backend-temporal-server-config
Temporal server configuration
backup-collection-maker
Notebook demonstrating how to create and update a Media Cloud collection.
mc-providers
Internal library to allow querying multiple media platforms with a consistent API.
mediacloud-news-client
An internal client library to access the new Mediacloud news archive search.
sc-buffet
Sous-chef buffet - Self-service data access for sous-chef.
system-metrics
Daily performance metrics for the mediacloud application
wal-g-aws-graviton2
WAL-G built for AWS Graviton2