PacktPublishing / Asynchronous-Programming-in-Rust

Asynchronous Programming in Rust, published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect output in chapter 4 examples

Boingboingsplat opened this issue · comments

When running the a-epoll example from Chapter 4, it only outputs 3 responses before finishing:

RECEIVED: Event { events: 1, epoll_data: 4 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 19 Feb 2024 18:27:00 GMT

request-4
------

RECEIVED: Event { events: 1, epoll_data: 3 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 19 Feb 2024 18:27:01 GMT

request-3
------

RECEIVED: Event { events: 1, epoll_data: 2 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 19 Feb 2024 18:27:02 GMT

request-2
------

FINISHED

The behavior is the same in b-epoll-mio, with three responses handled before the program finishes.

I've investigated it and it appears the issue is that after the buffer for the stream has been drained, I'm receiving another event for that stream with an already empty buffer, which immediately falls through to the Ok(n) if n == 0 match arm and causes handled_events to be incremented an extra time.

I'm running the examples from Ubuntu 20.04.4 LTS in WSL. For good measure, here's my WSL version information:

> wsl --version
WSL version: 2.0.9.0
Kernel version: 5.15.133.1-1
WSLg version: 1.0.59
MSRDC version: 1.2.4677
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.3155

That's strange, but thanks for posting.

I run Ubuntu 20,04 on WSL 2 myself and get the expected result. It seems like you run a version of WSL installed through the Windows Store (since the wsl --version command works for you, and it doesn't for me). I'm not sure if that matters.

I know that I've simplified the example somewhat since we don't check what kind of event we get returned (it could be a hangup signal for example in which case it sould be visible in the events field). I encountered the issue of mio reporting an extra event when the socket is closed by the server (this happens on Windows), so it might be what's happening here as well.

Let's check two things:

  1. Make sure delayserver is running under the same ubuntu instance in wsl
  2. Let's add a printout of the events field that epoll returns by adding two lines to the handle_events function so we can check the bitflags and see if there are other events that are reported:

Add:

loop {
            match streams[index].read(&mut data) {
                Ok(n) if n == 0 => {
                    let evts = event.events;
                    println!("{:032b}", evts);
                    handled_events += 1;
                    break;

The handle_events function should look like this now:

fn handle_events(events: &[Event], streams: &mut [TcpStream]) -> Result<usize> {
    let mut handled_events = 0;
    for event in events {
        let index = event.token();
        let mut data = vec![0u8; 4096];

        loop {
            match streams[index].read(&mut data) {
                Ok(n) if n == 0 => {
                    let evts = event.events;
                    println!("{:032b}", evts);
                    handled_events += 1;
                    break;
                }
                Ok(n) => {
                    let txt = String::from_utf8_lossy(&data[..n]);

                    println!("RECEIVED: {:?}", event);
                    println!("{txt}\n------\n");
                }
                Err(e) if e.kind() == io::ErrorKind::WouldBlock => break,
                // this was not in the book example, but it's a error condition
                // you probably want to handle in some way (either by breaking
                // out of the loop or trying a new read call immidiately)
                Err(e) if e.kind() == io::ErrorKind::Interrupted => break,
                Err(e) => return Err(e),
            }
        }
    }

    Ok(handled_events)
}

When I run the example I get:

RECEIVED: Event { events: 1, epoll_data: 4 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Wed, 21 Feb 2024 18:36:25 GMT

request-4
------

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 3 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Wed, 21 Feb 2024 18:36:26 GMT

request-3
------

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 2 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Wed, 21 Feb 2024 18:36:27 GMT

request-2
------

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 1 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Wed, 21 Feb 2024 18:36:28 GMT

request-1
------

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 0 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Wed, 21 Feb 2024 18:36:29 GMT

request-0
------

00000000000000000000000000000001
FINISHED

If you post the output you get I think we can figure this out pretty quickly.

Oh, and one more thing. Reboot your system. I've had more than one occasion where the networking stack in WSL has given me some issues that I couldn't resolve without rebooting the system.

@Boingboingsplat, did you resolve this issue or do you still get the wrong output?

I also encountered the same issue on Linux. For instance, the output with your added println looks like this on my machine:

RECEIVED: Event { events: 1, epoll_data: 4 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 26 Feb 2024 13:07:13 GMT

request-4
-----

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 3 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 26 Feb 2024 13:07:14 GMT

request-3
-----

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 2 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 26 Feb 2024 13:07:15 GMT

request-2
-----

00000000000000000000000000000001
RECEIVED: Event { events: 1, epoll_data: 1 }
HTTP/1.1 200 OK
content-length: 9
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 26 Feb 2024 13:07:16 GMT

request-1
-----

00000000000000000000000000000001
00000000000000000000000000000001
FINISHED

However, this behavior is not consistent, sometimes it works without issues and looks like your output, but sometimes the same event is returned twice by poll and then immediately counted as handled again.

@erdmannc, thank you very much for reporting. OK, I see. The EPOLLIN bitflag has the value 0x01, and if that's the only event that's reported, the bitmask would look like 00000000000000000000000000000001 just like you (and I) got in the output. It also makes sense that this only happens occasionally.

So what we get is a false notification of a read event on a stream that we already read to EOF. I know that this is something that can happen, but I never thought that you'd get that on such a simple example that only communicates with a server on the local host. It never happened to me on 3 different machines, and not to the two technical reviewers.

But, the right thing to do is to handle the case of a false wakeup like we get here in the handle_events function. Unfortunately, to fix it in a way that changes the example minimally was somewhat difficult. I had to make multiple, althoug small, changes to main.rs.

The simplest way for me is to paste in the new main.rs here. While I'm certain it will fix the issue, I would really appreciate if you have the time to run it and confirm that it works for you too:

//! # FIXES:
//!
//! ## FIX ISSUE #4:
//! See:https://github.com/PacktPublishing/Asynchronous-Programming-in-Rust/issues/4
//! Some users reported false event notification causing the counter to increase
//! due to the OS reporting a READ event after we already read the TcpStream to EOF.
//! This caused the counter to increment on the same TcpStream twice and thereby
//! exiting the program before all events were handled.
//!
//! The fix for this is to account for false wakeups which is an easy fix but requires
//! a few changes to the example. I've added an explicit comment: "FIX #4", the places
//! I made a change so it's easy to spot the differences to the example code in the book.

use std::{
    // FIX #4 (import `HashSet``)
    collections::HashSet,
    io::{self, Read, Result, Write},
    net::TcpStream,
};

use ffi::Event;
use poll::Poll;

mod ffi;
mod poll;

/// Not the entire url, but everyhing after the domain addr
/// i.e. http://localhost/1000/hello => /1000/hello
fn get_req(path: &str) -> String {
    format!(
        "GET {path} HTTP/1.1\r\n\
             Host: localhost\r\n\
             Connection: close\r\n\
             \r\n"
    )
}

fn handle_events(
    events: &[Event],
    streams: &mut [TcpStream],
    handled: &mut HashSet<usize>,
) -> Result<usize> {
    let mut handled_events = 0;
    for event in events {
        let index = event.token();
        let mut data = vec![0u8; 4096];

        loop {
            match streams[index].read(&mut data) {
                Ok(n) if n == 0 => {
                    // FIX #4
                    // `insert` returns false if the value already existed in the set. We
                    // handle it here since we must be sure that the TcpStream is fully
                    // drained due to using edge triggered epoll.
                    if !handled.insert(index) {
                        break;
                    }
                    handled_events += 1;
                    break;
                }
                Ok(n) => {
                    let txt = String::from_utf8_lossy(&data[..n]);

                    println!("RECEIVED: {:?}", event);
                    println!("{txt}\n------\n");
                }
                Err(e) if e.kind() == io::ErrorKind::WouldBlock => break,
                // this was not in the book example, but it's a error condition
                // you probably want to handle in some way (either by breaking
                // out of the loop or trying a new read call immidiately)
                Err(e) if e.kind() == io::ErrorKind::Interrupted => break,
                Err(e) => return Err(e),
            }
        }
    }

    Ok(handled_events)
}

fn main() -> Result<()> {
    let mut poll = Poll::new()?;
    let n_events = 5;

    let mut streams = vec![];
    let addr = "localhost:8080";

    for i in 0..n_events {
        let delay = (n_events - i) * 1000;
        let url_path = format!("/{delay}/request-{i}");
        let request = get_req(&url_path);
        let mut stream = std::net::TcpStream::connect(addr)?;
        stream.set_nonblocking(true)?;

        stream.write_all(request.as_bytes())?;
        // NB! Token is equal to index in Vec
        poll.registry()
            .register(&stream, i, ffi::EPOLLIN | ffi::EPOLLET)?;

        streams.push(stream);
    }

    // FIX #4: store the handled IDs
    let mut handled_ids = HashSet::new();

    let mut handled_events = 0;
    while handled_events < n_events {
        let mut events = Vec::with_capacity(10);
        poll.poll(&mut events, None)?;

        if events.is_empty() {
            println!("TIMEOUT (OR SPURIOUS EVENT NOTIFICATION)");
            continue;
        }

        // ------------------------------------------------------⌄ FIX #4 (new signature)
        handled_events += handle_events(&events, &mut streams, &mut handled_ids)?;
    }

    println!("FINISHED");
    Ok(())
}

When we build on this example on chapter 7 onwards, we do make it more robust and guard against false wakeups, so this (and probably the b-epoll-mio) are the only places this could be an issue.

I tested the fix "in spirit" on my M1 machine, which also showed the same issue, with b-epoll-mio. It works now as expected.

@KarstenB, thanks for helping me out. I've implemented a fix for this that should be clearly explained in the code for new readers. The problem is now handled in both a-epoll and b-epoll-mio.

Sorry for not foreseeing this potential issue earlier in the process, and thanks for taking the time to report it.

For anyone implementing using kqueue, I had the same issue and I fixed it by adding EV_ONESHOT flag in the Event struct when registering events.