Ziinc / crawldis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

web: can initialize immediate crawl jobs

Ziinc opened this issue · comments

commented
  • A crawl job is the scope of parsing work performed by a spider. All requests and parsed items will be linked to a crawl job.
  • Each job is given a job id.

For web, job management is linked to actual crawl jobs created from in cluster.

Consider using oban to manage persistent crawls. crawl jobs on cluster is not persistent, cluster has no persistence layer. on the other hand, web management should have a persistence layer for more possible functionality, like storing history etc.

v1:

  • see all running jobs
  • start/stop a job

v2:

  • see stats for a job
  • historical introspection
  • restarting
  • arguments
  • long running jobs
  • timeouts
  • scheduling
commented

Jobber has been implemented in #11 , web needs to be able to connect to db and schedule the jobs with oban.