UK Web Archive's repositories
webarchive-test-suite
A set of test files for web archiving.
flashfreeze
A rapid web page analyser and archiver.
webarchive-wat-mining
WAT (web archive transform) metadata mining
python-warcwriterpool
Hopefully off-setting some of the difficulties writing to WARCs (multiple open files, size limits, etc.).
SentimentalJ
A sentiment analysis module for node.js
webarchive-fuse
Use FUSE-J to mount web archive files as filesystems.
file-archive-recordreader
File Archive RecordReader
language-detection
Experimenting with https://code.google.com/p/language-detection/
python-webhdfs
Python wrapper around Hadoop's WebHDFS interface.
ukwa.github.com
UK Web Archive GitHub Homepage
boards
Meta-project
bootstrap
HTML, CSS, and JS toolkit from Twitter
docker-warcprox-squid
A squid setup suitable for scaling out warcprox
jruby-whois
Ruby's Whois gem wrapped up as a Maven dependency for Java code, via JRuby.
lucene-solr
Mirror of Apache Lucene & Solr
monitrix-bdt
Monitrix: Block Detection Tool
monitrix-imaqa
Monitrix: Image-based Quality Assurance module
openwayback-access-control
web access control (exclusion oracle) tools for optional use with wayback machine
proto-col
Prototype Collections Browser
warctools
warctools
webrender-har-daemon
Daemon intended to monitor a queue to which Heritrix will submit URLs. On receipt, the URL is submitted to a webservice (currently via django-phantomjs) and stores the response, a modified HAR record, in a WARC file.