cgarciae / pypeln

Concurrent data pipelines in Python >>>

Home Page:https://cgarciae.github.io/pypeln

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

maxsize not being respected for process.map

robdmc opened this issue · comments

Hello.
First of all. Let me just say that you changed my world yesterday when I found pypeln. I've wanted exactly this for a very long time. Thank you for writing it!!

Since I'm a brand new user, I might be misunderstanding, but I think I may have found a bug. I am running the following

  • conda python 3.6.8
  • pypeln==0.4.4
  • Running in Jupyter Lab with the following installed to view progress bars
pip install ipywidgets
jupyter labextension install @jupyter-widgets/jupyterlab-manager

Here is the code I am running

from tqdm.auto import tqdm
import pypeln as pyp
import time

in_list = list(range(300))
bar1 = tqdm(total=len(in_list), desc='stage1')
bar2 = tqdm(total=len(in_list), desc='stage2')
bar3 = tqdm(total=len(in_list), desc='stage3')

def func1(x):
    time.sleep(.01)
    bar1.update()
    return x

def func2(x):
    time.sleep(.2)
    return x
    
def func2_monitor(x):
    bar2.update()
    return x
    
def func3(x):
    time.sleep(.6)
    bar3.update()
    return x

(
    in_list
    | pyp.thread.map(func1, maxsize=1, workers=1)
    | pyp.process.map(func2, maxsize=1, workers=2)
    | pyp.thread.map(func2_monitor, maxsize=1, workers=1)
    | pyp.thread.map(func3, maxsize=1, workers=1)
    | list
    
);

This code runs stages while showing progress bars of when each node has processed data. Here is what I am seeing.

Screen Shot 2020-09-22 at 11 30 30 AM

It appears that the first stage is consuming the entire source without respecting the maxsize argument. If this is expected behavior, I would like to understand more.

Thank you.

Hey @robdmc !

Sorry for the late response, for some reason I overlooked this issue. I will look into it, thanks for the detailed example!

I think I found the culprit, when converting from one stage type to another there is an internal use of .to_iterable that wasn't taking into account the possibility of having a maxsize.

@robdmc Fixed in version 0.4.6, please update :)