dashbitco / flow

Computational parallel flows on top of GenStage

Home Page:https://hexdocs.pm/flow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possibly inaccurate doc about the use of partition

Arkham opened this issue · comments

Hi all,

I was following this section of the Flow documentation regarding partition: https://hexdocs.pm/flow/Flow.html#module-partitioning

If I run this code which doesn't have the partition step:

defmodule Test do
  def run do
    {:ok, stream} =
      "roses are red\nviolets are blue\n"
      |> StringIO.open()

    stream
    |> IO.binstream(:line)
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, " "))
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
      Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()
  end
end

I should receive something like:

[{"roses", 1}, {"are", 1}, {"red", 1}, {"violets", 1}, {"are", 1}, {"blue", 1}]

But instead I see this:

[{"are", 2}, {"blue\n", 1}, {"red\n", 1}, {"roses", 1}, {"violets", 1}]

That's because the contents are too small. So everything is sent on a single batch, to a single producer/consumer, that can count it correctly. Can you please send a PR that adds this clarification to the docs? Thank you!

Of course, do you think there is any way to show the advantage of using 'partition' in a simpler piece of code?

Unfortunately, you can only specify the max_demand when you use partition, so I just added a paragraph in the doc to explain that this can happen.

@Arkham you can specify max_demand on from_enumerable. :) Can you please give it a try?

I gave it a quick try locally and I got this by passing max_demand: 1 to from_enumerable:

[{"are", 1}, {"red\n", 1}, {"roses", 1}, {"are", 1}, {"blue\n", 1}, {"violets", 1}]

Aha, that's really cool, I can add that to the doc. Should I remove the comment then?

Closing this in favor of the PR anyway. :)

I think you can keep your commend and show an example with max_demand: 1 to illustrate how you can reproduce it. :)