[Bug] Utilising pypeln appears to cause a SIGHUP

Question

[Bug] Utilising pypeln appears to cause a SIGHUP

SimonBiggs opened this issue 3 years ago · comments

Utilising pypeln.process I am able to achieve quite a nice speed up allowing the usage of my 24 CPU cores with quite a nice syntax within my tensorflow dataset pipeline. However, after ~70mins of it running happily my software receives a SIGHUP signal and subsequently shuts down in compliance.

I am using Python 3.9 with the version of pypeln in the PR #78

Tensorflow version is 2.7.0

Originally my pipeline was built in the following way, this results in no SIGHUP issues:

def generator():
    while True:
        patch_type = np.random.choice(patch_types, p=patch_frequencies)
        patch_function = patch_functions[patch_type]

        try:
            yield patch_function()
        except IndexError:
            pass

shape = _config.convert_cfg_tuple(cfg.model.shape)
output_signature = (
    tf.TensorSpec(
        shape=shape,
        dtype=tf.float32,
    ),
    tf.TensorSpec(
        shape=shape,
        dtype=tf.float32,
    ),
)

dataset = tf.data.Dataset.from_generator(
    generator, output_signature=output_signature
)

When I make the following changes to the generator function to patch in the usage of pypeln and take advantage of its multiprocessing after about ~70mins the SIGHUP issue occurs:

def _infinite_none_generator():
    while True:
        yield None

def _original_generator():
    while True:
        patch_type = np.random.choice(patch_types, p=patch_frequencies)
        patch_function = patch_functions[patch_type]

        try:
            yield patch_function()
        except IndexError:
            pass

patch_generator = _original_generator()

def _call_patch_generator(_none: None):
    return next(patch_generator)

stage = pl.process.map(
    _call_patch_generator,
    _infinite_none_generator(),
    workers=1,
    maxsize=2,
)

def generator():
    for item in stage:
        yield item

I certainly could have done something silly. I'm still investigating further. The issue very well could be on my end. I'll report back and close this if the issue is on my end.

Thanks for building pypeln 🙂

Cheers 🙂 ,
Simon

Simon Biggs · Answer 1 · Wed Dec 22 2021 04:58:10 GMT+0800 (China Standard Time)

Nvm, I take that back. I was able to isolate pypeln, it wasn't the cause.

Simon Biggs · Answer 2 · Wed Jan 05 2022 14:11:46 GMT+0800 (China Standard Time)

Also, if someone stumbles upon this, I drastically simplified my approach based on this post by Tim Peters:

https://stackoverflow.com/a/43079667/3912576

def randomly_call_a_patch_function(rng):
    patch_type = _random_choice(rng=rng, a=patch_types, p=patch_frequencies)
    patch_function = patch_functions[patch_type]

    try:
        return patch_function(rng=rng)
    except IndexError:
        return randomly_call_a_patch_function(rng=rng)

def process(output_queue: multiprocessing.Queue):
    rng = np.random.default_rng()

    while True:
        output_queue.put(randomly_call_a_patch_function(rng=rng))

workers = cfg.dataset.batch_size
output_queue = multiprocessing.Queue(maxsize=workers)

def generator():
    pool = multiprocessing.Pool(
        workers, initializer=process, initargs=(output_queue,)
    )

    try:
        while True:
            yield output_queue.get()

    finally:
        pool.close()
        pool.join()

output_signature = (
    tf.TensorSpec(
        shape=input_shape,
        dtype=tf.float32,
    ),
    tf.TensorSpec(
        shape=output_shape,
        dtype=tf.float32,
    ),
)

dataset = tf.data.Dataset.from_generator(
    generator, output_signature=output_signature
)