Define multiple workers in config and spawn one job per item
nemobis opened this issue · comments
nemobis commented
Currently we have
tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ls *py
mass_worker_1.py mass_worker_2.py mass_worker_3.py mass_worker.py upload_checker.py worker.py
This duplicates code, which is just ugly. What's risky is that each worker runs indefinitely. It would be better to have a single "permanent" worker, which reads configuration for concurrency etc. from some configuration file and then spawns a labs grid job for each item, or at least for each download, so that they always use different IPs and are less likely to be blocked.
Rohit Dua commented
I don't think grid jobs can spawn new grid jobs.
So the mass_worker now starts a new worker with worker number as argument, such as
tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ./mass_worker.py 1
tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ./mass_worker.py 2
This avoids duplicate code.
tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ls *.py
mass_worker.py upload_checker.py worker.py