Luv-based lwt engine throws an error when polling
patricoferris opened this issue · comments
Bit of a strange issue I'm not quite sure where to open but it involves weird interactions between luv-based lwt engines. It would be nice to have a smaller example than this but presumably it has more to do with doing a lot of IO rather than anything to do with irmin-unix
or git-unix
.
Running the following (an updated version of https://irmin.org/tutorial/getting-started#sync) can run correctly or produce errors.
open Irmin_unix
open Lwt.Syntax
module Git_store = Git.FS.KV(Irmin.Contents.String)
module Sync = Irmin.Sync.Make(Git_store)
let main () =
let config = Irmin_git.config ~bare:true "/tmp/irmin-repro" in
let* repo = Git_store.Repo.v config in
let* t = Git_store.main repo in
let* remote = Git_store.remote "https://github.com/mirage/irmin.git" in
let* _status = Sync.pull_exn t remote `Set in
Lwt.return_unit
let standard_lwt () =
Lwt_main.run (main ())
let luv_lwt () =
let () = Lwt_engine.set (new Lwt_luv.engine) in
Lwt_main.run (main ())
let eio_lwt () =
Eio_main.run @@ fun env ->
Lwt_eio.with_event_loop ~clock:env#clock @@ fun _token ->
Lwt_eio.Promise.await_lwt (main ())
let () =
(* standard_lwt () *)
(* luv_lwt () *)
eio_lwt ()
This is running under macOS with the following versions of key libraries:
irmin
: vendored and checked out to the3.3.0
release taggit-unix
:3.9.1
lwt
:5.6.1
, in order to test theluv
-based engine, it also adds the code at https://github.com/ocsigen/lwt/blob/01eb4583f1f3a782351621248c7cb705056fb63e/src/unix/luv/lwt_luv.mleio
:0.4
The standard_lwt
version seems to work correctly for me.
The luv_lwt
version fails with the following:
Fatal error: exception Invalid_argument("Sync.pull_exn: Handshake got an error")
Raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3077, characters 20-29
Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
Called from Dune__exe__Main in file "error/main.ml", line 29, characters 2-12
The eio_lwt
version fails with:
Eio_luv.Luv_error(EEXIST) (* file already exists *)
Raised at Eio_luv.or_raise in file "eio.0.4/lib_eio_luv/eio_luv.ml", line 53, characters 15-34
Called from Eio_luv.Low_level.Poll.await_writable in file "eio.0.4/lib_eio_luv/eio_luv.ml", line 384, characters 17-70
Called from Eio_luv.wakeup in file "eio.0.4/lib_eio_luv/eio_luv.ml", line 862, characters 4-8
Called from Luv__Error.catch_exceptions in file "src/error.ml", line 287, characters 4-7
It looks like the two luv
-based ones could be linked. Any ideas, perhaps something related to the implementation of polling in eio_luv
?
http://docs.libuv.org/en/v1.x/poll.html says:
It is not okay to have multiple active poll handles for the same socket, this can cause libuv to busyloop or otherwise malfunction.
Possibly some kind of use-after-close of an FD? Can you strace the process, or similar?
Here's a dtruss
log https://gist.github.com/patricoferris/fdfff843b368fda54a8d131ffa477d3a (I added a print statement to see the FD from await_writeable
hence "FD 13"
)
I don't see the error coming from the kernel. Probably libuv made it:
https://github.com/libuv/libuv/blob/988f2bfc4defb9a85a536a3e645834c161143ee0/src/unix/poll.c#L72
I guess we're trying to wait on the same FD twice. Ideally, Eio_luv would handle that correctly, though it's odd that we're being asked to do that, if so.
Looks like that might be the case:
let () =
Eio_luv.run @@ fun _env ->
Eio.Switch.run @@ fun sw ->
let src, _dst = Eio_unix.socketpair ~sw () in
let fd = Option.get @@ Eio_unix.FD.peek_opt src in
Eio.Fiber.both
(fun () -> Eio_unix.await_writable fd)
(fun () -> Eio_unix.await_writable fd)
(* Eio_luv.Luv_error(EEXIST) (* file already exists *) *)
Should be fixed by ocaml-multicore/eio#279.