Misaligned pointer dereference on Mac M1 Pro & Mac M2 Pro
Eitu33 opened this issue · comments
We have been encountering the following issue when running wasmer 3.3 and wasmer 4.0 on M1 & M2:
thread 'execution' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x109a5fc18', /Users/thomas/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.0.0/src/trap/traphandlers.rs:219:28
rustc 1.72.0-nightly (f4b80cacf 2023-06-30)
binary: rustc
commit-hash: f4b80cacf93ca216c75f6ae12f4b9dec19eba42f
commit-date: 2023-06-30
host: aarch64-apple-darwin
release: 1.72.0-nightly
LLVM version: 16.0.5
This issue is not reproduced on:
linux_x86_64
win_x86_64
After looking into the traphandlers.rs
code, we noticed that there was a cast made towards the libc::ucontext_t
type:
let ucontext = &mut *(context as *mut libc::ucontext_t);
But the used libc::ucontext_t
is different depending on the platform, on linux we have:
libc::unix::linux_like::linux::gnu::b64::x86_64
pub struct ucontext_t // size = 936 (0x3A8), align = 0x8
Located at libc-0.2.147/src/unix/linux_like/linux/gnu/b64/x86_64/mod.rs
And for mac we have:
libc::unix::bsd::apple::b64::aarch64::align
pub struct ucontext_t // size = 880 (0x370), align = 0x10
Located at libc-0.2.147/src/unix/bsd/apple/b64/aarch64/align.rs
The context that is being casted into ucontext_t
at L219
has its alignment set at 0x08
:
context: *mut c_void // size = 8, align = 0x8
This produces a misalignment between the expected 0x10
and the provided 0x08
on these types of mac architectures.
This is currently a big issue for us, if help is required we would be glad to contribute.
we're also hitting this in 2.3.0 in anoma/namada#1721
Might be related: #4059
Get same error on M1 16GB with wasmer-vm-3.1.1/src/trap/traphandlers.rs:220:28
Note that this is likely becoming visible due to rust-lang/rust#98112 landing in Rust 1.70, which adds debug assertions to misaligned pointer dereferences.
It's possible that macOS does not guarantee this pointer is going to be 16 byte aligned even though a structure within it (the NEON floating point state) is nominally supposed to be. I can't easily find anything about it, you might end up having to file a radar about it. I am sure you can work around it writing out your own less-aligned structure to cast to, though, but that's obviously not fun.
Hi,
In the original issue it's not an assert that fails.
It's that cast: context as *mut libc::ucontext_t
that causes the panic.
context
is *mut c_void
AFAIK, for the cast to work one need align(*mut c_void) >= align(*mut libc::ucontext_t)
but align(*mut c_void) == 0x8
and align(*mut libc::ucontext_t) == 0x10
so obviously the property is not meet.
looking at https://github.com/rust-lang/libc/blob/main/src/unix/bsd/apple/b64/aarch64/align.rs
if ucontext_t
is 0x10
align it's because it inherits it's alignment from __darwin_mcontext64
which itself inherits it from __darwin_arm_neon_state64
which is explicitly aligned to 16 here if libc_int128
is not defined.
I guess that if libc_int128
is defined this line pub __v: [::__uint128_t; 32],
would also causes the struct to be 16 aligned.
If this is correct (don't' have M1 or M2 so I can't try) this seems to be a dead end, but this would lead me to think that on such platform * c_void should be aligned to 16, so maybe there is something to look on the rust std side.
But pointers are aligned to usize isn't it?
Looking at https://doc.rust-lang.org/beta/core/ffi/enum.c_void.html# I found this
To model pointers to opaque types in FFI, until extern type is stabilized, it is recommended to use a newtype wrapper around an empty byte array. See the Nomicon for details.
I'm not smart enough to find a solution using opaque type (more precisely I think it won't help here), but maybe some else will.
Regards
I was able to hack a fix to this issue, based on the research presented in this thread.
sug0@df6aa26#diff-5d272e62ff3e9e2d5249a6300df4eb51eeb019e143582bfe8f93d7cd3674d176R150-R168
If anyone has some free time, please test the patch. The branch (applel-fix-heliax
) in the linked remote contains other alignment patches, originally authored by some folks from the NEAR project. All patches are based on v2.3.0
, but I reckon they should be easily ported over to new versions.
Cheers,
Tiago
The fix looks fine, and I can implement it. The issue is, how can I reproduce the issue, there is no simple way to have it seems?
I have the hardware, but not a simple way to reproduce the issue.
We have code that fails here.
It's not simple but at least it's public code.
I'm gonna see with my teammates if we can simplifies things.
Edit:
At least we can do tests for you if you want.
@bilboquet can you test with that branch #4120
@bilboquet any news on the fix efficiency?
Ok, closing the issue!
@Eitu33, do you have a PoC to reproduce the issue in wasmer 3.3 or wasmer 4.0? thanks.
@junxzm1990 No, the issue was reproduced in a relatively complex scenario. I could make a PoC but I'm not sure I can find the time / get the clearance to do so. What I work on is open source so If you really want to reproduce the issue I could make a branch on our repo with everything setup to reproduce, but that would be very far from a minimal example.
@Eitu33, A repo for reproduction would be greatly appreciated. I can follow up on the repo attempting to create a simpler PoC.
Hello.
I've tried to experiment with some thinks, but I think I've got MRE of exact same problem.
In some cases changing *mut oneshot::Sender<()>
to another type worked without panics
Wasm module code:
use std::ffi::c_void;
use tokio::sync::oneshot;
#[no_mangle]
extern "C" fn callback(ptr: *mut c_void, code: i32, data: *const u8, len: u32) {
let tx = unsafe { Box::from_raw(ptr as *mut oneshot::Sender<()>) };
}
Dependencies:
[dependencies]
tokio = { version = "1.24.2", default-features = false, features = [ "rt", "sync" ] }
[crates-io.patch]
tokio = { git = "https://github.com/wasix-org/tokio.git", branch = "1.24.2-fixed" }
Build with:
cargo wasix build
And host part is:
let wasm_bytes = include_bytes!("../../target/wasm32-wasmer-wasi/debug/module.wasm");
let mut store = Store::default();
let module = Module::new(&store, wasm_bytes)?;
let mut wasi_fenv = WasiEnv::builder("test").finalize(&mut store)?;
let mut imports = wasi_fenv
.import_object_for_all_wasi_versions(&mut store, &module)
.unwrap();
let shared_memory = module.imports().memories().next().map(|a| *a.ty());
let spawn_type = match shared_memory {
Some(ty) => SpawnMemoryType::CreateMemoryOfType(ty),
None => SpawnMemoryType::CreateMemory,
};
let memory = {
let mut store_mut = store.as_store_mut();
let data_mut = wasi_fenv.data_mut(&mut store_mut).clone();
data_mut.tasks().build_memory(&mut store_mut, spawn_type)?
};
imports.define("env", "memory", memory.clone().unwrap());
let instance = Instance::new(&mut store, &module, &imports).unwrap();
wasi_fenv.initialize_with_memory(&mut store, instance.clone(), memory, true)?;
Ok(())
Panics:
thread 'main' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x13307170c', /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:167:18
stack backtrace:
0: rust_begin_unwind
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
1: core::panicking::panic_nounwind_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:96:14
2: core::panicking::panic_misaligned_pointer_dereference
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:175:5
3: wasmer_vm::instance::Instance::imported_memory
at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:167:18
4: wasmer_vm::instance::Instance::get_vmmemory
at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:256:26
5: wasmer_vm::instance::Instance::memory_init
at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:748:22
6: wasmer_vm_memory32_init
at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/libcalls.rs:628:9
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread caused non-unwinding panic. aborting.
I've also tried to find what exactly causes panic on module side, so:
use std::ffi::c_void;
use tokio::sync::oneshot;
#[no_mangle]
extern "C" fn callback(ptr: *mut c_void, code: i32, data: *const u8, len: u32) {
let ptr: *mut oneshot::Sender<()> = ptr as _;
let tx = unsafe { Box::from_raw(ptr) }; // if commented - everything is ok
}