Bottomless Archive Project's repositories
library-of-alexandria
Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.
library-of-alexandria.github.io
The official website of the Library of Alexandria project.
common-crawl-client
This library is a very lightweight client to Common Crawl's WARC files.
url-collector
An application that crawls the Common Crawl corpus for URLs with the specified file extensions.