Toimik (toimik)

Toimik

toimik

Geek Repo

Open source projects for an upcoming web search engine

Location:Singapore

Github PK Tool:Github PK Tool

Toimik's repositories

WarcProtocol

Parser for WARC (aka WebArchive) files

Language:C#License:Apache-2.0Stargazers:8Issues:1Issues:5

CommonCrawl

Common Crawl's processing tools

Language:C#License:Apache-2.0Stargazers:5Issues:1Issues:0

UrlNormalization

URL normalizer to canonicalize (standardize) the text representation of a URL to determine if differently-formatted URLs are identical

Language:C#License:Apache-2.0Stargazers:4Issues:1Issues:0

IpAddressEnumeration

IP address enumerators

Language:C#License:Apache-2.0Stargazers:0Issues:1Issues:0

RobotsProtocol

Parsers for robots.txt (aka Robots Exclusion Standard / Robots Exclusion Protocol), Robots Meta Tag, and X-Robots-Tag

Language:C#License:Apache-2.0Stargazers:0Issues:2Issues:0

SitemapsProtocol

Parsers for sitemap / sitemap index (aka Sitemaps Protocol)

Language:C#License:Apache-2.0Stargazers:0Issues:2Issues:0

Wikimedia

Wikimedia Downloads' processing tools

Language:C#License:Apache-2.0Stargazers:0Issues:1Issues:0