Possibly inaccurate doc about the use of partition

Question

Possibly inaccurate doc about the use of partition

Arkham opened this issue 7 years ago · comments

Hi all,

I was following this section of the Flow documentation regarding partition: https://hexdocs.pm/flow/Flow.html#module-partitioning

If I run this code which doesn't have the partition step:

defmodule Test do
  def run do
    {:ok, stream} =
      "roses are red\nviolets are blue\n"
      |> StringIO.open()

    stream
    |> IO.binstream(:line)
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, " "))
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
      Map.update(acc, word, 1, & &1 + 1)
    end)
    |> Enum.to_list()
  end
end

I should receive something like:

[{"roses", 1}, {"are", 1}, {"red", 1}, {"violets", 1}, {"are", 1}, {"blue", 1}]

But instead I see this:

[{"are", 2}, {"blue\n", 1}, {"red\n", 1}, {"roses", 1}, {"violets", 1}]

José Valim · Answer 1 · Thu May 11 2017 23:07:06 GMT+0800 (China Standard Time)

That's because the contents are too small. So everything is sent on a single batch, to a single producer/consumer, that can count it correctly. Can you please send a PR that adds this clarification to the docs? Thank you!

Ju Liu · Answer 2 · Thu May 11 2017 23:08:37 GMT+0800 (China Standard Time)

Of course, do you think there is any way to show the advantage of using 'partition' in a simpler piece of code?

José Valim · Answer 3 · Thu May 11 2017 23:32:17 GMT+0800 (China Standard Time)

Set max_demand to 1 or 2 maybe? -- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D*

Ju Liu · Answer 4 · Fri May 12 2017 23:06:42 GMT+0800 (China Standard Time)

Unfortunately, you can only specify the max_demand when you use partition, so I just added a paragraph in the doc to explain that this can happen.

José Valim · Answer 5 · Fri May 12 2017 23:45:54 GMT+0800 (China Standard Time)

@Arkham you can specify max_demand on from_enumerable. :) Can you please give it a try?

José Valim · Answer 6 · Fri May 12 2017 23:47:32 GMT+0800 (China Standard Time)

I gave it a quick try locally and I got this by passing max_demand: 1 to from_enumerable:

[{"are", 1}, {"red\n", 1}, {"roses", 1}, {"are", 1}, {"blue\n", 1}, {"violets", 1}]

Ju Liu · Answer 7 · Fri May 12 2017 23:48:32 GMT+0800 (China Standard Time)

Aha, that's really cool, I can add that to the doc. Should I remove the comment then?

José Valim · Answer 8 · Fri May 12 2017 23:48:33 GMT+0800 (China Standard Time)

Closing this in favor of the PR anyway. :)

José Valim · Answer 9 · Fri May 12 2017 23:49:54 GMT+0800 (China Standard Time)

I think you can keep your commend and show an example with max_demand: 1 to illustrate how you can reproduce it. :)