rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.

Home Page:http://rdicosmo.github.io/parmap/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

seg faults

JuliaLawall opened this issue · comments

I have the impression that since some recent version of Parmap, if the underlying OCaml code raises an exception, then Parmap seg faults, which is not user friendly. Has something changed in Parmap recently that would cause this issue? I will try to make a test case shortly.

Here is a test program:

let _ =
  Parmap.parmap
    (function x ->
      failwith "should not crash")
    (Parmap.L [1;2;3;4;5])

I am using parmap from opam, 4.07.0

As a workaround, you can try this:

let robust_parmap ncores chunksize f l =
  Parmap.parmap ~ncores ~chunksize (function x ->
    try Some (f x)
    with _exn -> None (* honestly, you should properly log the exact exception *)
    ) (Parmap.L l)

There is a former MIT student who created an ocaml library just because he was not happy with the way parmap was dealing with exception-raising code...
I don't think you can expect things to work fine if you have parallel code throwing an exception inside of the parallel section; hence the try-catch.

@Drup, who is not watching the project, has committed some changes in the code recently.

I also get the core dumped with parmap.

let _ =
  Parmap.parmap
    (function x ->
      failwith "should not crash")
    (Parmap.L [1;2;3;4;5]);;
[Parmap]: error at index j=0 in (0,0), chunksize=1 of a total of 1 got exception Failure("should not crash") on core 0 

[Parmap]: error at index j=0 in (1,1), chunksize=1 of a total of 1 got exception Failure("should not crash") on core 1 

[Parmap]: error at index j=0 in (2,2), chunksize=1 of a total of 1 got exception Failure("should not crash") on core 2 

[Parmap]: error at index j=0 in (3,3), chunksize=1 of a total of 1 got exception Failure("should not crash") on core 3 

[Parmap]: error at index j=0 in (4,4), chunksize=1 of a total of 1 got exception Failure("should not crash") on core 4 

Segmentation fault (core dumped)

Note that with Parany, the behavior would not be very better: UnixJunkie/parany#17 (infinite loop)

The parmap maintainer should try to git bisect from this 1.0-rc7 version using your example program.

Here is a test program:

let _ =
  Parmap.parmap
    (function x ->
      failwith "should not crash")
    (Parmap.L [1;2;3;4;5])

I am using parmap from opam, 4.07.0

Thanks Julia for providing this example: this indeed is not normal behaviour.

After closer investigation, it seems that this change in behaviour comes from differences in the way recent versions of OCaml check the data for integrity when unmarshaling from a block.
In previous versions one could find:

if (magic != Intext_magic_number) 
    failwith("input_value_from_block: bad object");

that led to the exception captured at OCaml level.

More recent versions of the compiler seem to skip at least this part of the test, which may explain the SIGSEGV.

Nonetheless, the right way to fix this is to inform the master process that a failure occurred, and prevent any attempt to unmarshal data that will be anyway corrupt or incomplete.
This logic is already implemented in the "mapper" backend, but not in the "simplemapper" backend (which is the one used in the example from @JuliaLawall ).

This goes on my TODO list :-)

This is now fixed in commit f140dbc
This is now tagged as 1.2.3 and the corresponding opam package is submitted to opam-repository.
Thanks to @JuliaLawall for reporting this issue.