bytecodealliance / wit-bindgen

A language binding generator for WebAssembly interface types

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wasm component breakage between v0.20 and v0.21

juntyr opened this issue · comments

In my research project, I am compiling several components (which implement different compression methods) and then run them with wasmtime (and a browser WebAssembly runtime). Running one of these components recently broke, all the others continue to work. After some debugging, I found that changing the wit-bindgen version back to 0.20 doesn't cause the error, but 0.21 (and 0.22) both have the issue.

I have attached both versions of the component (apologies for the size - it includes the SZ3 compressor which is the only one that breaks). For 0.21, I get the following error (in wasmtime) when invoking the codec-id function, which takes no inputs and returns a string:

RuntimeError: in fcbench:codec/codecs@0.1.0.codec-id: error while executing at wasm backtrace:
    0: 0x32943f - <unknown>!<wasm function 2566>
    1: 0x363007 - <unknown>!<wasm function 5920>
    2: 0x3617f4 - <unknown>!<wasm function 5527>

From debugging in the other backend, I think the error may occur during the cabi post-return function.

Apologies also for not having a minimal reproducing case - for some reason, all smaller compressor components continue to work, so there might be some spooky side effects at a distance at play. Thank you in advance for any insights you can provide!

sz3-0.20.wasm
sz3-0.21.wasm

In the released commit range what you're probably running into is #876. That being said I don't know why that would be causing issues here.

Are you sure that wit-bindgen is all that changed between those two builds?

In 0.20.wasm above I see:

    (func (;6433;) (type 1) (param i32 i32 i32 i32) (result i32)
      local.get 0
      local.get 1
      local.get 2
      local.get 3
      call 6434
    )
;; ...
    (export "cabi_realloc" (func 6433))

This is what I would expect, that's just a small shim over another function.

In 0.21.wasm, however, I see:

    (func (;6459;) (type 1) (param i32 i32 i32 i32) (result i32)
      call 13
      local.get 0
      local.get 1
      local.get 2
      local.get 3
      call 6434
      call 4716
    )
;; ...
    (export "cabi_realloc" (func 6459))

That doesn't look right, specifically call 13 at the start and call 4716 at the end.

IIRC that was something some part of the toolchain used to do where it injected constructors/destructors explicitly around exports. There was a comment in wit-bindgen saying after Rust 1.69 it was no longer necessary so I turned it off by default. In the components though I see that 1.76 was used to compile both, so now I'm a bit confused.

Can you try specifying run_ctors_once_workaround: true, in the 0.21 build and see if it fixes the issue? If it does then I would be confused. That changed in #868 which was part of the 0.20.0 release, so both builds should in theory not work then...

@sunfishcode do you recall the story around __wasm_call_ctors? Is the change I made in #868 to stop explicitly calling it correct? I see now it may still be required and Rust 1.69 was just where some behavior changed...

Thank you so much for looking into this!

Are you sure that wit-bindgen is all that changed between those two builds?

Yes, my only change was changing the version in my Cargo.toml dependencies section.

That doesn't look right, specifically call 13 at the start and call 4716 at the end.

That is indeed weird and interesting that it only causes an issue with this component but not the others. The Sz3 component is one of a few that also need to compile a C library that is wrapped by a *-sys Rust crate that I use, for which I'm using the wasi-sdk v20.0, but the other components also with C parts still work ... intruiging.

Can you try specifying run_ctors_once_workaround: true, in the 0.21 build and see if it fixes the issue? If it does then I would be confused. That changed in #868 which was part of the 0.20.0 release, so both builds should in theory not work then...

In v0.20, v0.21, and v0.22 that unfortunately gives this compilation error:

error[E0425]: cannot find function `bool_lift` in module `_rt`
    --> codecs/core-wasm/src/lib.rs:18:5
     |
  18 | /     wit_bindgen::generate!({
  19 | |         path: "../wit",
  20 | |         world: "fcbench-codec",
  21 | |         pub_export_macro: true,
  22 | |         run_ctors_once_workaround: true,
  23 | |     });
     | |______^ not found in `_rt`
     |
     = help: consider importing this function:
             wit_bindgen::rt::bool_lift

Is there some other option I need to specify?

Here's the situation to the best of my knowledge:

If you have what wasi-libc considers to be a "command", wasi-libc exports _start which calls __wasm_call_ctors before calling the user main function.

If you have what wasi-libc considers to be a "reactor", wasi-libc exports _initialize which calls __wasm_call_ctors.

If your program contains constructors but doesn't contain a call to __wasm_call_ctors anywhere, wasm-ld will auto-wrap all exported functions to add a call to __wasm_call_ctors. Notably, it doesn't protect against __wasm_call_ctors being called more than once.

In Rust, a "bin" crate uses wasi-libc's notion of "command", so it gets a call to__wasm_call_ctors. Rust never fully adopted the wasi-libc notion of a "reactor", so a "cdylib" crate does not use this notion of "reactor", so it does not have a _initialize function and does not get a call to __wasm_call_ctors. So if the program has constructors somehow, it will get these wrappers that call __wasm_call_ctors.

Thanks @sunfishcode! I was halfway through pinging you if you had more info and investigating some more at the same time, and makes sense!

I think the change in #868 to remove calling ctors was incorrect and I'll flip it back on by default.

@juntyr looks like there's a copy/paste typo leading to the compilation error you're seeing above.

I'll look to fix this all.

In Rust, a "bin" crate uses wasi-libc's notion of "command", so it gets a call to__wasm_call_ctors. Rust never fully adopted the wasi-libc notion of a "reactor", so a "cdylib" crate does not use this notion of "reactor", so it does not have a _initialize function and does not get a call to __wasm_call_ctors. So if the program has constructors somehow, it will get these wrappers that call __wasm_call_ctors.

Thanks @sunfishcode for that explanation! I remember that in and old version of my code, I used to have the following:

#[cfg(target_arch = "wasm32")]
#[doc(hidden)]
#[no_mangle]
pub extern "C" fn _initialize() {
    extern "C" {
        fn __wasm_call_ctors();
    }

    unsafe { __wasm_call_ctors() }
}

I wonder what happens if I include that again in my now-component-model code ... I'll have results in a bit

If that ends up working @juntyr, could you also try #895 and see if that works?

@alexcrichton Adding that code back in indeed fixes the issue, I'll try to rebuild with your PR now

@alexcrichton Yes, your PR also seems to fix the issue!

Ok thanks for the confirmation! I think it's still a bit of a mystery why 0.20.0 worked, but regardless I think that I made this change in error.