Bottomless Archive Project (bottomless-archive-project)

Bottomless Archive Project

bottomless-archive-project

Geek Repo

A project about archiving anything that's available digitally.

Github PK Tool:Github PK Tool

Bottomless Archive Project's repositories

library-of-alexandria

Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

Language:JavaLicense:MITStargazers:109Issues:3Issues:440

java-warc

Read Web ARChive (WARC) files in Java.

Language:JavaLicense:Apache-2.0Stargazers:5Issues:1Issues:14

library-of-alexandria.github.io

The official website of the Library of Alexandria project.

Language:HTMLStargazers:1Issues:2Issues:0

common-crawl-client

This library is a very lightweight client to Common Crawl's WARC files.

Language:JavaStargazers:0Issues:2Issues:0
License:MITStargazers:0Issues:2Issues:0
Language:JavaLicense:MITStargazers:0Issues:2Issues:4

url-collector

An application that crawls the Common Crawl corpus for URLs with the specified file extensions.

Language:JavaLicense:MITStargazers:0Issues:2Issues:5