Use Parsl with very long task generators
WardLT opened this issue · comments
Is your feature request related to a problem? Please describe.
I am using Parsl to run tasks from a generator which produces more tasks than I can store in memory. As the results are also too large to store, I write them to disk rather than storing them
The code is something like
for task in evil_generator():
task_future = do_work(task)
write_result(task, task_future)
dfk.wait_for_current_tasks()
If evil_generator
produces too many objects, I use too much memory on the login node of my cluster.
Describe the solution you'd like
Parsl to block when submitting a future if too many tasks are in progress.
Describe alternatives you've considered
I've got something horrific where I use a semaphore to slow down the submission of tasks until other ones have finished.
sema = Semaphore(10000)
for task in evil_generator():
sema.acquire()
task_future = do_work(task)
task_future.add_done_callback(lambda x: sema.release())
write_result(task, task_future)
dfk.wait_for_current_tasks()
Additional context
The problem at hand: https://github.com/HydrogenStorage/molecular-stability-computer/blob/main/compute_emin.py#L182