maxsize not being respected for process.map
robdmc opened this issue · comments
Hello.
First of all. Let me just say that you changed my world yesterday when I found pypeln. I've wanted exactly this for a very long time. Thank you for writing it!!
Since I'm a brand new user, I might be misunderstanding, but I think I may have found a bug. I am running the following
- conda python 3.6.8
- pypeln==0.4.4
- Running in Jupyter Lab with the following installed to view progress bars
pip install ipywidgets
jupyter labextension install @jupyter-widgets/jupyterlab-manager
Here is the code I am running
from tqdm.auto import tqdm
import pypeln as pyp
import time
in_list = list(range(300))
bar1 = tqdm(total=len(in_list), desc='stage1')
bar2 = tqdm(total=len(in_list), desc='stage2')
bar3 = tqdm(total=len(in_list), desc='stage3')
def func1(x):
time.sleep(.01)
bar1.update()
return x
def func2(x):
time.sleep(.2)
return x
def func2_monitor(x):
bar2.update()
return x
def func3(x):
time.sleep(.6)
bar3.update()
return x
(
in_list
| pyp.thread.map(func1, maxsize=1, workers=1)
| pyp.process.map(func2, maxsize=1, workers=2)
| pyp.thread.map(func2_monitor, maxsize=1, workers=1)
| pyp.thread.map(func3, maxsize=1, workers=1)
| list
);
This code runs stages while showing progress bars of when each node has processed data. Here is what I am seeing.
It appears that the first stage is consuming the entire source without respecting the maxsize
argument. If this is expected behavior, I would like to understand more.
Thank you.
Hey @robdmc !
Sorry for the late response, for some reason I overlooked this issue. I will look into it, thanks for the detailed example!
I think I found the culprit, when converting from one stage type to another there is an internal use of .to_iterable
that wasn't taking into account the possibility of having a maxsize
.