bhklab / med-imagetools

Transparent and reproducible medical image processing pipelines in Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error with custom pipeline inside main script module (`joblib.externals.loky.process_executor.BrokenProcessPool`)

jpeoples opened this issue · comments

I was writing a custom pipeline using the library, The file is structured like

import ...
from imgtools.pipeline import Pipeline

class MyPipeline(Pipeline):
    def __init__(self, ...):
        # setup pipeline
        super().__init__(n_jobs=-1)
    def process_one_subject(self, subject_id):
        # custom code here

def main():
    # Handle command line args and run pipeline

if __name__=="__main__": main()

This was failing with a BrokenProcessPool error (full traceback below) (on Windows 11). The error has something to do with pickling for multiprocessing, and doesn't happen if n_jobs=1.

The error can be worked around by separating the script being executed from the module containing the pipeline/main function. That is, in the above example, remove

if __name__=="__main__": main()

then create a new wrapper script:

from my_pipeline_module import main

if __name__=="__main__": main()

and execute that, rather than the module itself.

It is possible that it could be a Windows specific error (see here for example).

I'm not sure that this is something that can be fixed within med-imagetools. If not, though, it may be worth documenting somewhere.

Traceback:

Traceback (most recent call last):
  File "F:\SimpsonLab\r01_aim2\pipeline_improvement\.venv\lib\site-packages\joblib\externals\loky\process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "C:\Users\jacob\AppData\Local\Programs\Python\Python38\lib\multiprocessing\queues.py", line 116, in get
    return _ForkingPickler.loads(res)
TypeError: tuple expected at most 1 argument, got 3
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\jacob\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\jacob\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "f:\simpsonlab\r01_aim2\pipeline_improvement\r01_crlm_aim2_pipeline\chi\r01_crlm_aim2\pipeline.py", line 135, in <module>
    if __name__ == "__main__": main()
  File "f:\simpsonlab\r01_aim2\pipeline_improvement\r01_crlm_aim2_pipeline\chi\r01_crlm_aim2\pipeline.py", line 133, in main
    pipeline.run()
  File "F:\SimpsonLab\r01_aim2\pipeline_improvement\.venv\lib\site-packages\imgtools\pipeline.py", line 106, in run
    Parallel(n_jobs=self.n_jobs, verbose=verbose)(
  File "F:\SimpsonLab\r01_aim2\pipeline_improvement\.venv\lib\site-packages\joblib\parallel.py", line 1098, in __call__
    self.retrieve()
  File "F:\SimpsonLab\r01_aim2\pipeline_improvement\.venv\lib\site-packages\joblib\parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "F:\SimpsonLab\r01_aim2\pipeline_improvement\.venv\lib\site-packages\joblib\_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\jacob\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\jacob\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable

Can you try running the custom Pipeline using n_jobs=1? It might be a bug with the multiprocessing backend. Maybe it'll be helpful to allow the users to select which multiprocessing backend to use in case of these incompatibilities.

Yes, this is correct, it doesn't happen when n_jobs=1.

@jpeoples Does this error occur if you use less n_jobs? I'm wondering if it just hit a hardware constraint or it's legitimately a bug (looks like it could be related to pickling?)

I'm mostly surprised because the joblib passes our windows CI/CD tests, but it doesn't seem to be too happy here. Also, AutoPipeline runs properly on my personal PC and haven't seen this error before.

@skim2257 -- I tried n_jobs=2 -- same error.

I now think it has something to do with the interaction of my code for this project and the multiprocessing backend, rather than a more general error. I tried making a super minimal custom pipeline and couldn't reproduce the same error at all, regardless of n_jobs.

I was originally thinking the if __name__=="__main__" block was creating a problem with the pickling, but that isn't the case, I don't think, given that the minimal custom pipeline fails to reproduce it. I'm kind of at a loss as to what the problem is -- luckily it is easy to work around.

I'll close for now -- if I happen to get to the bottom of it, I'll let you know.