Channel sometimes uses 100% of cpu

Question

Channel sometimes uses 100% of cpu

mvucenovic opened this issue 4 years ago · comments

In some workloads, current implementation can lead to a CPU core being maxed out, even though there is not much work going on. I've managed to reproduce this behavior on a small example, where we have one producer and multiple consumer, and producer is slower than consumers. Here is a gist with code:

https://gist.github.com/mvucenovic/12221d23211ab7e22989de517a799b7f

With this code issue is reproducible with the unbounded channel, and bounded channels larger than 1.

Note that this is not a correctness bug, system will not deadlock or starve, but it will use 100% of CPU, presumably because of the unrestricted spin loop in recv method.

It is possible, though I have not tried, that the issue could occur when there are multiple producers that are faster then consumer.

Miloš Vučenović · Answer 1 · Wed Sep 02 2020 18:21:07 GMT+0800 (China Standard Time)

I am fairly certain the problem is with unnecessary wakeups (recv_ops notifications) in recv method here https://github.com/stjepang/async-channel/blob/master/src/lib.rs#L514

Here is a hypothetical scenario of work - that I think is realistic but I could be wrong, in a case where we have 3 receivers, blocked on recv ops, and empty queue:

A message is sent to the queue
on successful try_send we notify recv_ops, so one of 3 receivers (A) is awoken
If the channel is unbounded / larger then 1 receiver A task will notify recv_ops, then in next loop iteration do try_recv, and receive the message sent in 1
receiver B, awoken by receiver A, notifies recv_ops (waking receiver C), tries for try_recv (empty), and then sleeps on a listener
receiver C, awoken by receiver B, notifies recv_ops (waking receiver B), tries for try_recv (empty), and then seeps on a listener
Repeat 4 and 5 until a new message, doing spin loop and wasting CPU cycles

Now the thing that I do not understand is, why do we even need to notify recv_ops as a receiver once we are awoken by listener? I have some scenario in my head where we should do that to avoid consuming notification not meant for us, but only if we are awoken in the previous iteration of loop AND we got Ok(_) from the try_recv, not in a general case

Miquel van Smoorenburg · Answer 2 · Thu Sep 03 2020 02:36:12 GMT+0800 (China Standard Time)

I have the exact same problem, and spent the last two days finding it (thought it was in my own complicated multiple-futures-and-select! stuff) and trying to reproduce it ... only to find that someone beat me by 8 hours! :) My example is smaller though:

async fn receiver(rx: async_channel::Receiver<i32>, id: u32) {
    while let Ok(item) = rx.recv().await {
        println!("{}: got {}:", id, item);
    }
}

#[tokio::main]
async fn main() {
    let (tx, rx) = async_channel::bounded(5000);

    tokio::spawn(receiver(rx.clone(), 2));
    tokio::spawn(receiver(rx.clone(), 3));

    for i in 0..10 {
        tx.send(i).await.expect("send failed");
        tokio::time::delay_for(tokio::time::Duration::from_millis(10)).await;
    }

    tokio::time::delay_for(tokio::time::Duration::new(300, 0)).await;
}

Deleted user · Answer 3 · Fri Sep 04 2020 00:02:29 GMT+0800 (China Standard Time)

Here's a fix: https://github.com/stjepang/async-channel/pull/14/files

Anyone wants to try it out?

amaush · Answer 4 · Fri Sep 04 2020 03:48:30 GMT+0800 (China Standard Time)

The fix is working for me and the spin loop is gone. Thank you for your quick response.

Miloš Vučenović · Answer 5 · Fri Sep 04 2020 03:49:09 GMT+0800 (China Standard Time)

Looking good for both bounded and unbounded channels.

Deleted user · Answer 6 · Fri Sep 04 2020 18:34:09 GMT+0800 (China Standard Time)

Fixed in v1.4.2