rohit-dua / BUB

BUB : Book Uploader Bot

Home Page:http://tools.wmflabs.org/bub/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Define multiple workers in config and spawn one job per item

nemobis opened this issue · comments

Currently we have

tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ls *py
mass_worker_1.py  mass_worker_2.py  mass_worker_3.py  mass_worker.py  upload_checker.py  worker.py

This duplicates code, which is just ugly. What's risky is that each worker runs indefinitely. It would be better to have a single "permanent" worker, which reads configuration for concurrency etc. from some configuration file and then spawns a labs grid job for each item, or at least for each download, so that they always use different IPs and are less likely to be blocked.

I don't think grid jobs can spawn new grid jobs.
So the mass_worker now starts a new worker with worker number as argument, such as

tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ./mass_worker.py 1
tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ./mass_worker.py 2

This avoids duplicate code.

tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ls *.py
mass_worker.py  upload_checker.py  worker.py