andersea / slurry

Python async data processing microframework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Examples of use?

inklesspen opened this issue · comments

Hi, I think slurry is good for my use case, but I cannot tell for sure, because there are not enough examples of using the various parts of the library in combination.

I would particularly like to know:

  • the best way to have a pipeline take input from a Trio memory channel
  • the best way to have a pipeline tap output into a Trio memory channel
  • how to enable/disable sections of a pipeline on the fly

Hi! Thanks for taking a look at slurry. I agree that there isn't enough examples. I will keep this issue open, as a reminder to expand the documentation in this regard.

In the meantime, I can provide you with these simple examples:

  1. A pipeline takes any async iterable as input. Since a trio memory channel is an async iterable, it can be used as an input directly.
import trio
from slurry import Pipeline
from slurry.sections import Map

async def generate_ints(input: trio.MemorySendChannel):
    i = 0
    while True:
        await input.send(i)
        await trio.sleep(1)
        i += 1

async def main():
    send, receive = trio.open_memory_channel(1)
    async with trio.open_nursery() as nursery:
        nursery.start_soon(generate_ints, send)
        async with Pipeline.create(
            receive,
            Map(lambda x: x+1)
        ) as pipeline, pipeline.tap() as tap:
            async for i in tap:
                print(i)
                if i > 5:
                    nursery.cancel_scope.cancel()

trio.run(main)

Note that the Map section can also take its input directly, so the above pipeline is equivalent to:

    async with Pipeline.create(
        Map(lambda x: x+1, receive)
    ) as pipeline, pipeline.tap() as tap:
  1. When feeding an output to a memory channel, create a new tap, iterate it and use memorychannel.send() to send it along. Keep in mind also, that a tap is actually a trio MemorySendChannel, underneath the hood, and can be treated as such, meaning instead of iterating it, you can call receive on it, if you like that pattern better.
    async with Pipeline.create(
        Map(lambda x: x+1, receive)
    ) as pipeline, pipeline.tap() as tap:
        async for item in tap:
            await mymemorychannel.send(item)
    ### This is equivalent to -
    async with Pipeline.create(
        Map(lambda x: x+1, receive)
    ) as pipeline, pipeline.tap() as tap:
        while True:
            await mymemorychannel.send(await tap.receive())
  1. There is no direct way to enable or disable subsections of a pipeline. However there is a possible solution, which is to rewire the pipeline dynamically at runtime, using the pipeline.extend() method. You have to be careful to extend the pipeline, while the previous configuration is still running and only discard the old 'extension' when you are receiving items on the new one. This is because the way the pipeline is designed right now, if there is a time when an item is ready to be sent, but noone is listening, slurry will close the pipeline down.

Hope this helps! Happy holidays!