dashbitco / flow

Computational parallel flows on top of GenStage

Home Page:https://hexdocs.pm/flow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to catch exceptions?

KamilLelonek opened this issue · comments

Related to: elixir-lang/gen_stage#132

I have malformed CSV file:

this,is,malformed,"csv,data

and even if I do:

    try do
      file_path
      |> File.stream!()
      |> NimbleCSV.RFC4180.parse_stream()
      |> Flow.from_enumerable()
      |> Flow.partition()
      |> Enum.to_list()
    catch
      :exit, exit -> nil
    end

I'm still getting:

08:02:07.440 [error] GenServer #PID<0.184.0> terminating
** (NimbleCSV.ParseError) expected escape character " but reached the end of file
    (nimble_csv) lib/nimble_csv.ex:207: NimbleCSV.RFC4180.finalize_parser/1
    (elixir) lib/stream.ex:800: Stream.do_transform/8
    (gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
    (gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
    (gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:"$demand", :forward}}
State: #Function<0.55142349/1 in GenStage.Streamer.init/1>

08:02:07.447 [info]  GenStage consumer #PID<0.185.0> is stopping after receiving cancel from producer #PID<0.184.0> with reason: {%NimbleCSV.ParseError{message: "expected escape character \" but reached the end of file"},
 [{NimbleCSV.RFC4180, :finalize_parser, 1,
   [file: 'lib/nimble_csv.ex', line: 207]},
  {Stream, :do_transform, 8, [file: 'lib/stream.ex', line: 800]},
  {GenStage.Streamer, :handle_demand, 2,
   [file: 'lib/gen_stage/streamer.ex', line: 18]},
  {GenStage, :noreply_callback, 3, [file: 'lib/gen_stage.ex', line: 2170]},
  {GenStage, :"-producer_demand/2-lists^foldl/2-0-", 3,
   [file: 'lib/gen_stage.ex', line: 2209]},
  {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 616]},
  {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 686]},
  {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}


08:02:07.449 [error] GenServer #PID<0.185.0> terminating
** (NimbleCSV.ParseError) expected escape character " but reached the end of file
    (nimble_csv) lib/nimble_csv.ex:207: NimbleCSV.RFC4180.finalize_parser/1
    (elixir) lib/stream.ex:800: Stream.do_transform/8
    (gen_stage) lib/gen_stage/streamer.ex:18: GenStage.Streamer.handle_demand/2
    (gen_stage) lib/gen_stage.ex:2170: GenStage.noreply_callback/3
    (gen_stage) lib/gen_stage.ex:2209: GenStage."-producer_demand/2-lists^foldl/2-0-"/3
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:DOWN, #Reference<0.3607181968.2794192898.201176>, :process, #PID<0.184.0>, {%NimbleCSV.ParseError{message: "expected escape character \" but reached the end of file"}, [{NimbleCSV.RFC4180, :finalize_parser, 1, [file: 'lib/nimble_csv.ex', line: 207]}, {Stream, :do_transform, 8, [file: 'lib/stream.ex', line: 800]}, {GenStage.Streamer, :handle_demand, 2, [file: 'lib/gen_stage/streamer.ex', line: 18]}, {GenStage, :noreply_callback, 3, [file: 'lib/gen_stage.ex', line: 2170]}, {GenStage, :"-producer_demand/2-lists^foldl/2-0-", 3, [file: 'lib/gen_stage.ex', line: 2209]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 616]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 686]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}
State: {%{}, %{done?: true, producers: %{}, trigger: #Function<2.79412627/4 in Flow.Window.Global.materialize/5>}, {0, 4}, [], #Function<33.66250525/4 in Flow.Materialize.mapper_ops/1>}

Is there a way to handle that?

Do it inside a Task and use Task.yield to get the result. Also note flow is probably a cannon shot if all you want is to parse a CSV file in parallel. Take a look at Task.async_stream too.

Actually, after parsing CSV I'm doing A LOT of other work in parallel. Should I parse the entire file firstly and only then leverage Flow?

@KamilLelonek if the work is in parallel but per row, then async_stream will still be enough. You just do the whole procssing per row inside the async_stream task. However, if you need to shuffle, partiion or group the rows in any way, then yes, Flow is the way to go.