trandoshan-io / scheduler

Go orchestrator process used to schedule website parsing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scheduler

Build Status Go Report Card Maintainability

Scheduler is a Go written program designed to orchestrate resource parsing

features

  • use scalable messaging protocol (nats)

how it work

  • The Scheduler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag doneSubject
  • When an URL is received the scheduler will apply list of crawling rules to determinate if resource is to be crawled
  • If resource should be crawled to scheduler will sent the url to nats with subject todoSubject for the crawlers

crawling rules

Here is the rules that determinate if crawling is to be done:

  • Url has not been already crawled

About

Go orchestrator process used to schedule website parsing

License:GNU General Public License v3.0


Languages

Language:Go 88.4%Language:Dockerfile 11.6%