Andy Jackson's repositories
anjackson.github.io
My personal website
the-turing-way
Host repository for The Turing Way: a how to guide for reproducible data science
digipres.github.io
Auto-generated static web site digipres.org
crawlcache
Experimental crawler proxy using a WARC-based cache.
notebook
An experiment with managing Markdown docs.
wikipedia-sopa-blackout
A web archive of the Wikipedia homepage during the 2012 SOPA Blackout
digipres-notebook
Open notebook for digital preservation stuff hosted by GitBook
awesome-digital-preservation
Carefully curated list of awesome digital preservation resources.
drumknott
An experimental clerk.
datasette-lite
Datasette running in your browser using WebAssembly and Pyodide
awesome-web-archiving
An Awesome List for getting started with web archiving
ipywardley
Bringing Wardley Map magic to Jupyter notebooks
ukwa-reports
Generating Reports
outbackcdx
Web archive index server based on RocksDB
warcit
Convert Directories, Files and ZIP Files to Web Archives (WARC)
golem
Experimental crawler using Scrapy and Selenium
ukwa-monitor
Dashboard and monitoring system for the UK Web Archive
browsertrix-crawler
Run a high-fidelity browser-based crawler in a single Docker container
sphinx-comments
hypothes.is interaction layer with Sphinx
rclone-trials
Experimenting with Rclone and how it works with HDFS
using-ffmpeg
Containerised ffpmeg and example Jupyter notebooks.
scrapy-url-frontier
A Scrapy module for URL Frontier integration
anjackson
GitHub profile README
ukwa-manage
Shepherding our web archives from crawl to access.
ukwa-services
Deployment configuration for all UKWA services stacks.