There are 51 repositories under web-archiving topic.
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Core Python Web Archiving Toolkit for replay and recording of web archives
Collect and revisit web pages.
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。
Serverless replay of web archives directly in the browser
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
Archiveror will help you preserve the webpages you love. 💾
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
Streaming WARC/ARC library for fast web archive IO
WarcDB: Web crawl data as SQLite databases.
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
Social Feed Manager user interface application.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The repository and website hosting the peer review process for new Programming Historian lessons
Perpetual Access To The Scholarly Record
🗄️ A simple CLI for converting WARC to Parquet.
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
A server to collect & archive websites that also supports video downloads
Website-downloader is a powerful and versatile Python script designed to download entire websites along with all their assets. This tool allows you to create a local copy of a website, including HTML pages, images, CSS, JavaScript files, and other resources. It is ideal for web archiving, offline browsing, and web development.