maxsize not being respected for process.map

Question

maxsize not being respected for process.map

robdmc opened this issue 4 years ago · comments

Hello.
First of all. Let me just say that you changed my world yesterday when I found pypeln. I've wanted exactly this for a very long time. Thank you for writing it!!

Since I'm a brand new user, I might be misunderstanding, but I think I may have found a bug. I am running the following

conda python 3.6.8
pypeln==0.4.4
Running in Jupyter Lab with the following installed to view progress bars

pip install ipywidgets
jupyter labextension install @jupyter-widgets/jupyterlab-manager

Here is the code I am running

from tqdm.auto import tqdm
import pypeln as pyp
import time

in_list = list(range(300))
bar1 = tqdm(total=len(in_list), desc='stage1')
bar2 = tqdm(total=len(in_list), desc='stage2')
bar3 = tqdm(total=len(in_list), desc='stage3')

def func1(x):
    time.sleep(.01)
    bar1.update()
    return x

def func2(x):
    time.sleep(.2)
    return x
    
def func2_monitor(x):
    bar2.update()
    return x
    
def func3(x):
    time.sleep(.6)
    bar3.update()
    return x

(
    in_list
    | pyp.thread.map(func1, maxsize=1, workers=1)
    | pyp.process.map(func2, maxsize=1, workers=2)
    | pyp.thread.map(func2_monitor, maxsize=1, workers=1)
    | pyp.thread.map(func3, maxsize=1, workers=1)
    | list
    
);

This code runs stages while showing progress bars of when each node has processed data. Here is what I am seeing.

It appears that the first stage is consuming the entire source without respecting the maxsize argument. If this is expected behavior, I would like to understand more.

Thank you.

Cristian Garcia · Answer 1 · Mon Oct 12 2020 04:03:12 GMT+0800 (China Standard Time)

Hey @robdmc !

Sorry for the late response, for some reason I overlooked this issue. I will look into it, thanks for the detailed example!

Cristian Garcia · Answer 2 · Mon Oct 12 2020 04:43:35 GMT+0800 (China Standard Time)

I think I found the culprit, when converting from one stage type to another there is an internal use of .to_iterable that wasn't taking into account the possibility of having a maxsize.

Cristian Garcia · Answer 3 · Mon Oct 12 2020 05:03:29 GMT+0800 (China Standard Time)

@robdmc Fixed in version 0.4.6, please update :)