ocaml-multicore / lwt_eio

Use Lwt libraries from within Eio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Luv-based lwt engine throws an error when polling

patricoferris opened this issue · comments

Bit of a strange issue I'm not quite sure where to open but it involves weird interactions between luv-based lwt engines. It would be nice to have a smaller example than this but presumably it has more to do with doing a lot of IO rather than anything to do with irmin-unix or git-unix.

Running the following (an updated version of https://irmin.org/tutorial/getting-started#sync) can run correctly or produce errors.

open Irmin_unix
open Lwt.Syntax

module Git_store = Git.FS.KV(Irmin.Contents.String)
module Sync = Irmin.Sync.Make(Git_store)

let main () =
    let config = Irmin_git.config ~bare:true "/tmp/irmin-repro" in
    let* repo = Git_store.Repo.v config in
    let* t = Git_store.main repo in
    let* remote = Git_store.remote "https://github.com/mirage/irmin.git" in
    let* _status = Sync.pull_exn t remote `Set in
    Lwt.return_unit

let standard_lwt () =
  Lwt_main.run (main ())
  
let luv_lwt () =
  let () = Lwt_engine.set (new Lwt_luv.engine) in
  Lwt_main.run (main ())

let eio_lwt () =
  Eio_main.run @@ fun env ->
  Lwt_eio.with_event_loop ~clock:env#clock @@ fun _token ->
  Lwt_eio.Promise.await_lwt (main ())

let () =
  (* standard_lwt () *)
  (* luv_lwt () *)
  eio_lwt ()

This is running under macOS with the following versions of key libraries:

The standard_lwt version seems to work correctly for me.

The luv_lwt version fails with the following:

Fatal error: exception Invalid_argument("Sync.pull_exn: Handshake got an error")
Raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3077, characters 20-29
Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
Called from Dune__exe__Main in file "error/main.ml", line 29, characters 2-12

The eio_lwt version fails with:

Eio_luv.Luv_error(EEXIST) (* file already exists *)
Raised at Eio_luv.or_raise in file "eio.0.4/lib_eio_luv/eio_luv.ml", line 53, characters 15-34
Called from Eio_luv.Low_level.Poll.await_writable in file "eio.0.4/lib_eio_luv/eio_luv.ml", line 384, characters 17-70
Called from Eio_luv.wakeup in file "eio.0.4/lib_eio_luv/eio_luv.ml", line 862, characters 4-8
Called from Luv__Error.catch_exceptions in file "src/error.ml", line 287, characters 4-7

It looks like the two luv-based ones could be linked. Any ideas, perhaps something related to the implementation of polling in eio_luv?

http://docs.libuv.org/en/v1.x/poll.html says:

It is not okay to have multiple active poll handles for the same socket, this can cause libuv to busyloop or otherwise malfunction.

Possibly some kind of use-after-close of an FD? Can you strace the process, or similar?

Here's a dtruss log https://gist.github.com/patricoferris/fdfff843b368fda54a8d131ffa477d3a (I added a print statement to see the FD from await_writeable hence "FD 13")

I don't see the error coming from the kernel. Probably libuv made it:
https://github.com/libuv/libuv/blob/988f2bfc4defb9a85a536a3e645834c161143ee0/src/unix/poll.c#L72

I guess we're trying to wait on the same FD twice. Ideally, Eio_luv would handle that correctly, though it's odd that we're being asked to do that, if so.

Looks like that might be the case:

let () =
  Eio_luv.run @@ fun _env ->
  Eio.Switch.run @@ fun sw ->
  let src, _dst = Eio_unix.socketpair ~sw () in
  let fd = Option.get @@ Eio_unix.FD.peek_opt src in
  Eio.Fiber.both
    (fun () -> Eio_unix.await_writable fd)
    (fun () -> Eio_unix.await_writable fd)
(* Eio_luv.Luv_error(EEXIST) (* file already exists *) *)