What?

Keep a log/db of top 1m domains.

Plan

Create a Google BigTable dataset of every website and the graph of everything it links to.

Components that we need:

Something to fetch the URLs and write them to a table every n-days. (Is this necessary, once we have a set surely we just keep working from that?)
Something to fetch each site craw the pages for links.

Other things?

A utility for finding URLs to crawl

Language:Go 52.0%Language:Dockerfile 30.4%Language:Makefile 9.1%Language:Shell 8.5%