Parsl / parsl

Parsl - a Python parallel scripting library

Home Page:http://parsl-project.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use Parsl with very long task generators

WardLT opened this issue · comments

Is your feature request related to a problem? Please describe.
I am using Parsl to run tasks from a generator which produces more tasks than I can store in memory. As the results are also too large to store, I write them to disk rather than storing them

The code is something like

for task in evil_generator():
      task_future = do_work(task)
      write_result(task, task_future)

dfk.wait_for_current_tasks()

If evil_generator produces too many objects, I use too much memory on the login node of my cluster.

Describe the solution you'd like
Parsl to block when submitting a future if too many tasks are in progress.

Describe alternatives you've considered
I've got something horrific where I use a semaphore to slow down the submission of tasks until other ones have finished.

sema = Semaphore(10000)
for task in evil_generator():
      sema.acquire()
      task_future = do_work(task)
      task_future.add_done_callback(lambda x: sema.release())
      write_result(task, task_future)

dfk.wait_for_current_tasks()

Additional context
The problem at hand: https://github.com/HydrogenStorage/molecular-stability-computer/blob/main/compute_emin.py#L182