Shawn M. Jones's repositories
OffTopic-Detection
This system evaluates a series of mementos (archived web pages) to determine which are off topic. The series can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.
archivenow
A Tool To Push Web Resources Into Web Archives
collection-stories
This repository exists to share stories generated from the Dark and Stormy Archives project.
cs595-f13
Shared repository for ODU CS 495 / 595 Fall 2013
cs895-f20
ODU CS 795/795 Web Archiving Forensics, Fall 2020.
dsa-rainpuddle
This project implements the visualization components fo the Dark and Stormy Archives project.
government-sites-archive-projects
This repository contains work done to determine how much of www.guideline.gov and qualitymeasures.ahrq.gov were archived.
hr-contracting
CS 825 Project Showing the Geography of federal contracting in Hampton Roads
iipc-dsa-work
This repository contains work done on the IIPC Dark and Stormy Archives grant.
JCDL2023-website-source
The source of the JCDL 2023 website.
py-memento-client
A Memento Client Library in Python
python-boilerpipe
Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
robustlinks
Links on the web break all the time, robustify them!
shawnmjones.github.io
Shawn's GitHub Web Site
shot-scraper-test
https://www.ap.org/en
sqlite3worker
A threadsafe sqlite worker for Python
sumgram
sumgram is a tool that summarizes a collection of text documents by generating the most frequent sumgrams (multiple ngrams)
Timemap.py
Python class to parse an simplify access to Memento timemaps.
VisHash
Visual Hash for matching copies of visually similar images.
wren
Experiments in testable, scaleable crawler architectures
wsdlthesis
ODU WS-DL Thesis/Dissertation LaTeX Template