Chapter 4 code not working inside aarch64 VM
Ty3uK opened this issue · comments
Hi!
First of all huge thanks for this awesome book!
I'm working on MacBook Pro with M1 Pro, so I'm using OrbStack to run Linux VM for code from chapter 4.
When I'm running this code from aarch64 VM I've got this error:
[ty3uk@fedora a-epoll]$ cargo run
Compiling a-epoll v0.1.0 (/home/ty3uk/Asynchronous-Programming-in-Rust/ch04/a-epoll)
Finished dev [unoptimized + debuginfo] target(s) in 0.49s
Running `target/debug/a-epoll`
thread 'main' panicked at src/main.rs:30:19:
index out of bounds: the len is 5 but the index is 43680
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
But when I run amd64 VM it runs perfectly:
[ty3uk@fedora async-rs]$ cargo run
Compiling async-rs v0.1.0 (/home/ty3uk/async-rs)
Finished dev [unoptimized + debuginfo] target(s) in 2.47s
Running `target/debug/async-rs`
RECEIVED: Event { events: 1, epoll_data: 4 }
HTTP/1.1 200 OK
content-type: text/plain;charset=utf-8
Date: Wed, 21 Feb 2024 12:58:22 GMT
Content-Length: 9
request-4
------
RECEIVED: Event { events: 1, epoll_data: 3 }
HTTP/1.1 200 OK
content-type: text/plain;charset=utf-8
Date: Wed, 21 Feb 2024 12:58:23 GMT
Content-Length: 9
request-3
------
RECEIVED: Event { events: 1, epoll_data: 2 }
HTTP/1.1 200 OK
content-type: text/plain;charset=utf-8
Date: Wed, 21 Feb 2024 12:58:24 GMT
Content-Length: 9
request-2
------
RECEIVED: Event { events: 1, epoll_data: 1 }
HTTP/1.1 200 OK
content-type: text/plain;charset=utf-8
Date: Wed, 21 Feb 2024 12:58:25 GMT
Content-Length: 9
request-1
------
RECEIVED: Event { events: 1, epoll_data: 0 }
HTTP/1.1 200 OK
content-type: text/plain;charset=utf-8
Date: Wed, 21 Feb 2024 12:58:26 GMT
Content-Length: 9
request-0
------
FINISHED
What can be a reason? Different implementation of epoll on aarch64 and amd64? Or something else? Just curious.
Hmm, that's interesting. The syscalls should be the same so I don't think that's the problem. It seems like the problem is that the data returned on the field epoll_data
, that we use to index into the streams collection is wrong. It could be a number of things really, but that seems to be the place where it goes wrong.
I would start by changing from using a usize
to u64
in the definition of Event
in ffi.rs
:
Change:
#[derive(Debug)]
#[repr(C, packed)]
pub struct Event {
pub(crate) events: u32,
// Token to identify event
pub(crate) epoll_data: usize,
}
To:
#[derive(Debug)]
#[repr(C, packed)]
pub struct Event {
pub(crate) events: u32,
// Token to identify event
pub(crate) epoll_data: u64,
}
You'll probably have to cast/change to u64
several other places as well, but the compiler should guide you. That way we can at least know for sure that the data in that field will be treated as a 64 bit field of a concrete type.
If it still doesn't work it needs to be debugged a little bit further. Something to try is to use the libc
definition of the Event struct and see it that works better by importing it: https://docs.rs/libc/latest/libc/struct.epoll_event.html.
I don't have that platform in hand to experiment for myself at the moment, but please let me know if you get it working.
@cfsamson, maybe I did something wrong, but it's still not working :)
diff --git a/ch04/a-epoll/src/ffi.rs b/ch04/a-epoll/src/ffi.rs
index 6f9f8f0..a55bffc 100644
--- a/ch04/a-epoll/src/ffi.rs
+++ b/ch04/a-epoll/src/ffi.rs
@@ -15,11 +15,11 @@ extern "C" {
pub struct Event {
pub(crate) events: u32,
// Token to identify event
- pub(crate) epoll_data: usize,
+ pub(crate) epoll_data: u64,
}
impl Event {
pub fn token(&self) -> usize {
- self.epoll_data
+ self.epoll_data as usize
}
}
diff --git a/ch04/a-epoll/src/poll.rs b/ch04/a-epoll/src/poll.rs
index 352d2e7..09c5ec7 100644
--- a/ch04/a-epoll/src/poll.rs
+++ b/ch04/a-epoll/src/poll.rs
@@ -60,7 +60,7 @@ impl Registry {
pub fn register(&self, source: &TcpStream, token: usize, interests: i32) -> Result<()> {
let mut event = ffi::Event {
events: interests as u32,
- epoll_data: token,
+ epoll_data: token as u64,
};
let op = ffi::EPOLL_CTL_ADD;
[ty3uk@fedora-aarch64 a-epoll]$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/a-epoll`
thread 'main' panicked at src/main.rs:30:19:
index out of bounds: the len is 5 but the index is 43680
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@cfsamson, it's working after I change our Event
to epoll_event
as you suggested. 🙂
@cfsamson I am too running a M1 Mac, and I ran into this issue as well. After spending a few hours on it, I finally tracked down the cause: the epoll_event
struct is actually not packed in the aarch64 architecture. In fact it seems only x86_64 has it packed for backward compatibility with 32 bit arch syscalls.
epoll_event
as implemented inside the libc rust library:
https://github.com/rust-lang/libc/blob/3d0b15bbcc21d13219124cd74e2ff2d652f2f392/src/unix/linux_like/mod.rs#L209
So simply removing repr(packed) from the Event
struct will make it work on aarch64:
#[derive(Debug)]
#[repr(C)]
pub struct Event {
pub(crate) events: u32,
// Token to identify event
pub(crate) epoll_data: usize,
}
@cfsamson, it's working after I change our
Event
toepoll_event
as you suggested. 🙂
That's great! I suspected there was something about how that struct was treated on aarch64. Seems like @mrdemiurgic has already figured out why below.
@cfsamson I am too running a M1 Mac, and I ran into this issue as well. After spending a few hours on it, I finally tracked down the cause: the
epoll_event
struct is actually not packed in the aarch64 architecture. In fact it seems only x86_64 has it packed for backward compatibility with 32 bit arch syscalls.
epoll_event
as implemented inside the libc rust library: https://github.com/rust-lang/libc/blob/3d0b15bbcc21d13219124cd74e2ff2d652f2f392/src/unix/linux_like/mod.rs#L209So simply removing repr(packed) from the
Event
struct will make it work on aarch64:#[derive(Debug)] #[repr(C)] pub struct Event { pub(crate) events: u32, // Token to identify event pub(crate) epoll_data: usize, }
That's great. I was curious about this as well, so thanks for posting, saved me some time digging into libc on aarch64. It never occurred to me that there would be differences between the architectures for a struct like this, but that also explains why this detail was left out of the manpages.
I learned something new today. Thanks. I think we should leave this open for others running Linux on aarch64 so they can easily figure out what's wrong. I'll see if I can implement a fix by conditionally compiling the struct as packed or non-packed based on platform later on.
As this issue is fixed by #10 and the code now references this issue, I'll close this for now. Thanks for reporting!