dashbitco / flow

Computational parallel flows on top of GenStage

Home Page:https://hexdocs.pm/flow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flow.fork_join? or I may be thinking of this wrong

stevehebert opened this issue · comments

I've been playing with an extension for Flow that is either a good idea or I'm thinking about it wrong.

Say I have 2 or tasks that must be called - and these tasks are not CPU bound (instead, IO Bound) so they are best run in parallel.

This could be accomplished by:

  def fork_join(flow, fork1_fun, fork2_fun, join_fun) when is_function(fork1_fun, 1) and is_function(fork2_fun, 1) and is_function(join_fun, 2)

One problem I see is that there could be multiple subsequent signatures to accept [n] number of simultaneous joins.

Thoughts? Valuable? Other ways to think about it? I know I'm applying Rx-isms here which might not be right. I'm certainly happy to create a pull request if it helps.

I don't think it belongs in Flow. Flow is more about the data processing and high-level operations on the data. It is closer to Apache Spark than Rx.

What you proposed can be achieve with streams + Task.async_stream, unless you are doing recursive feedback as you traverse (as in most graph algorithms). If you are do need recursion + feedback, then GenStage is proabably your starting point, not quite flow.

Do experiment with things a bit and please let me know how it goes. Thanks for pinging and asking!