web-archive

There are 8 repositories under web-archive topic.

DO-SAY-GO / dn
💾 dn - offline full-text search and archiving for your Chromium-based browser.
archive archiver disk diskernet dn download-net internet memex search-engine web-archive web-browsing
Language:JavaScript 3854
Ray-D-Song / web-archive
Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。
cloudflare cloudflare-pages d1 free hono self-hosted serverless web-archive web-archiving
Language:TypeScript 892
webrecorder / replayweb.page
Serverless replay of web archives directly in the browser
web-archiving web-archive replay-web-page web-replay wayback-machine warc service-worker wacz
Language:TypeScript 854
webrecorder / browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
archiving cloud warc web-archive web-archiving webrecorder wacz kubernetes
Language:TypeScript 350
devanshbatham / ArchiveFuzz
Hunt down the secrets from the WebArchives for Fun and Profit
bughunting email-enumeration osint security-tools subdomain-enumeration subdomain-scanner web-archive
Language:Python 164
hoardy-web
Own-Data-Privateer / hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
archive backups internet internet-archiving self-hosted wayback-machine web-archiving web-archive archiver archiving web-browsing website-archive auto-save offline-reading snapshot browser-extension cli
Language:Python 100
cdx-summary
internetarchive / cdx-summary
Summarize web archive capture index (CDX) files.
archive cdx collection nodejs python report statistics summary warc web-archive webcomponents
Language:Python 75
TarekJor / bookmark-archiver
🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...
pocket archive wget google-chrome browser web-browser bookmarks html-export pinboard chromium safari firefox backup headless-browser rss web-archiving web-archive preservation python headless-chrome
Language:Python 37
archive-query-log
webis-de / archive-query-log
📜 The Archive Query Log.
information-retrieval information-retrieval-history internet-archive query-log search-engine-result-page serp wayback-machine web-archive
Language:Jupyter Notebook 31
ShaunLWM / ark
🚢 A self-hosted, personal archival application
archive web-archive archival time-machine
Language:JavaScript 21
antiufo / Shaman.Dokan.Warc
Mounts WARC files on Windows
warc dokan mount web-archive scraping fuse
Language:C# 16
YGGverse / YGGo
YGGo! Distributed Web Search Engine
crawler search-engine web spider pdo php curl fts5 open-source yggdrasil js-less parser mysql sphinx sphinxsearch distributed federative privacy-oriented web-archive alt-web
Language:PHP 15
anjackson / sliver
A tool for collection archival slivers of the web and web archives
web-archive web-archives web-archiving
Language:Python 14
MementoMap
oduwsdl / MementoMap
A Tool to Summarize Web Archive Holdings
memento web-archive profiling mementomap ukvs python
Language:Python 11
ghobs91 / Chronicl
Decentralized web archiver that distributes archives across Nostr relays
decentralized nostr web-archive web-archiving
Language:JavaScript 9
gitstorykit
swve / gitstorykit
Build rich git projects history discovery apps with ease, used by Gitstory
git time-machine github gitstory commits archive first-commit history web-archive
Language:TypeScript 8
minch-dev / DownTheMoon
A continuation of legacy XUL version of DownThemAll! ✔️preserves web.archive.org timestamps, ✔️advanced filters for remote directory tree mirroring, ✔️UI is tweaked for better UX
firefox xul xul-addon addon extension downloader download-manager downloads legacy web-archive webarchive modified date firefox-esr basilisk pale-moon palemoon preserve timestamp downthemall
Language:JavaScript 6
ysdn-info / ysdn.info
An archive of the York/Sheridan Design Program
archive design graduation-project replay-web-page sheridan-college university wacz web-archive york-university
Language:HTML 6
bottomless-archive-project / java-warc
Read Web ARChive (WARC) files in Java.
java warc web-archive library
Language:Java 5
q-m / replayweb.page-docker
Docker image for ReplayWeb.page
replay-web-page web-archive web-archiving web-replay
Language:Dockerfile 4
thiagolopes / alexandria
Backup and save websites
web-archive
Language:Python 4
ArtificialOSS / WebCrawl
Crawls the web to generate a huge dataset for training
ai artificial-intelligence commoncrawl crawler dataset-generation web-archive
Language:Python 3
ibnesayeed / utils
Miscellaneous utility scripts
utilities scripts linux shell python archiving web-archive hacktoberfest
Language:Python 3
india-ultimate / the-huddle
A mirror of The Huddle magazine
static-site ultimate-frisbee web-archive
Language:Python 3
laxika / java-warc
Read Web ARChive (WARC) files in Java.
warc web-archive java library
Language:Java 3
AndreMor8 / wubbzy-sites
Wubbzy archived sites
wubbzy restoration web-archive flash adobe-flash website static-site
Language:HTML 2
bodleian / wacksy
An experimental library for writing WACZ files
save-the-internet wacz web-archive cdxj warc
Language:Rust 2
grey-land / warc-browser
a cli toolkit for working with web archives
chromedp devtools rod warc web-archive go golang
Language:Go 2
wdhdev / web-archiver
Easily scrape, download and preview websites.
archive archiver html html5 javascript js nodejs web web-archive web-archiving website
Language:EJS 2
paulmelnikow / wabac
A versioned cache backed by cloud storage
web-cache web-archive versioned history
Language:JavaScript 1
shadowctrl / Palaceradio
PalaceRadio | A Next.js app Built from web Archive | Freelance Project @upwork
built-from-archive built-from-scratch nextjs upwork web-archive web-archives
Language:JavaScript 1
wayback-if-down / wayback-if-down.github.io
Redirect to a live website or an archived version if it's down.
redirect wayback-machine web-archive
Language:HTML 1
meadowingc / waybacker
Periodically crawl a set of websites and ensure that all of their pages are archived on the Wayback Machine. Mirror of https://codeberg.org/meadowingc/waybacker
blogging web-archive
Language:Go 0
s5-dev / archiver
Tool to archive websites and other content available on the Internet on the content-addressed S5 Network
archive archiver atproto bluesky content-addressed git http selfhosted twitch web web-archive youtube
Language:Dart 0
gnomegl / cdx
internet archive cdx api search for historical web data [basher package]
bash basher osint wayback-machine web-archive
Language:Shell
shadowctrl / Farsky
Farsky | A Next.js app Built from web Archive | Freelance Project @upwork
archive-org nextjs upwork web-archive web-archives
Language:JavaScript

web-archive

DO-SAY-GO / dn

Ray-D-Song / web-archive

webrecorder / replayweb.page

webrecorder / browsertrix

devanshbatham / ArchiveFuzz

Own-Data-Privateer / hoardy-web

internetarchive / cdx-summary

TarekJor / bookmark-archiver

webis-de / archive-query-log

ShaunLWM / ark

antiufo / Shaman.Dokan.Warc

YGGverse / YGGo

anjackson / sliver

oduwsdl / MementoMap

ghobs91 / Chronicl

swve / gitstorykit

minch-dev / DownTheMoon

ysdn-info / ysdn.info

bottomless-archive-project / java-warc

q-m / replayweb.page-docker

thiagolopes / alexandria

ArtificialOSS / WebCrawl

ibnesayeed / utils

india-ultimate / the-huddle

laxika / java-warc

AndreMor8 / wubbzy-sites

bodleian / wacksy

grey-land / warc-browser

wdhdev / web-archiver

paulmelnikow / wabac

shadowctrl / Palaceradio

wayback-if-down / wayback-if-down.github.io

meadowingc / waybacker

s5-dev / archiver

gnomegl / cdx

shadowctrl / Farsky