web: can initialize immediate crawl jobs
Ziinc opened this issue · comments
Ziinc commented
- A crawl job is the scope of parsing work performed by a spider. All requests and parsed items will be linked to a crawl job.
- Each job is given a job id.
For web, job management is linked to actual crawl jobs created from in cluster.
Consider using oban to manage persistent crawls. crawl jobs on cluster is not persistent, cluster has no persistence layer. on the other hand, web management should have a persistence layer for more possible functionality, like storing history etc.
v1:
- see all running jobs
- start/stop a job
v2:
- see stats for a job
- historical introspection
- restarting
- arguments
- long running jobs
- timeouts
- scheduling