trandoshan-io / crawler

Go process used to crawl websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

crawler

Build Status Go Report Card Maintainability

Crawler is a Go written program designed to crawl website

features

  • use tor SOCKS proxy to crawl hidden services
  • fast, built using valyala/fasthttp (up to 10x faster than net/http)
  • extract both absolute and relative URLs
  • use scalable messaging protocol (nats)

how it work

  • The Crawler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag todoSubject
  • When an URL is received the crawler start crawling
  • When crawling is done, the crawler will publish content to nats server with subject contentSubject and found urls with subject doneSubject

About

Go process used to crawl websites

License:GNU General Public License v3.0


Languages

Language:Go 96.2%Language:Dockerfile 3.8%