Toimik's repositories
WarcProtocol
Parser for WARC (aka WebArchive) files
CommonCrawl
Common Crawl's processing tools
UrlNormalization
URL normalizer to canonicalize (standardize) the text representation of a URL to determine if differently-formatted URLs are identical
IpAddressEnumeration
IP address enumerators
RobotsProtocol
Parsers for robots.txt (aka Robots Exclusion Standard / Robots Exclusion Protocol), Robots Meta Tag, and X-Robots-Tag
SitemapsProtocol
Parsers for sitemap / sitemap index (aka Sitemaps Protocol)