smol-rs / async-channel

Async multi-producer multi-consumer channel

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Channel sometimes uses 100% of cpu

mvucenovic opened this issue · comments

In some workloads, current implementation can lead to a CPU core being maxed out, even though there is not much work going on. I've managed to reproduce this behavior on a small example, where we have one producer and multiple consumer, and producer is slower than consumers. Here is a gist with code:

https://gist.github.com/mvucenovic/12221d23211ab7e22989de517a799b7f

With this code issue is reproducible with the unbounded channel, and bounded channels larger than 1.

Note that this is not a correctness bug, system will not deadlock or starve, but it will use 100% of CPU, presumably because of the unrestricted spin loop in recv method.

It is possible, though I have not tried, that the issue could occur when there are multiple producers that are faster then consumer.

I am fairly certain the problem is with unnecessary wakeups (recv_ops notifications) in recv method here https://github.com/stjepang/async-channel/blob/master/src/lib.rs#L514

Here is a hypothetical scenario of work - that I think is realistic but I could be wrong, in a case where we have 3 receivers, blocked on recv ops, and empty queue:

  1. A message is sent to the queue
  2. on successful try_send we notify recv_ops, so one of 3 receivers (A) is awoken
  3. If the channel is unbounded / larger then 1 receiver A task will notify recv_ops, then in next loop iteration do try_recv, and receive the message sent in 1
  4. receiver B, awoken by receiver A, notifies recv_ops (waking receiver C), tries for try_recv (empty), and then sleeps on a listener
  5. receiver C, awoken by receiver B, notifies recv_ops (waking receiver B), tries for try_recv (empty), and then seeps on a listener
  6. Repeat 4 and 5 until a new message, doing spin loop and wasting CPU cycles

Now the thing that I do not understand is, why do we even need to notify recv_ops as a receiver once we are awoken by listener? I have some scenario in my head where we should do that to avoid consuming notification not meant for us, but only if we are awoken in the previous iteration of loop AND we got Ok(_) from the try_recv, not in a general case

I have the exact same problem, and spent the last two days finding it (thought it was in my own complicated multiple-futures-and-select! stuff) and trying to reproduce it ... only to find that someone beat me by 8 hours! :) My example is smaller though:

async fn receiver(rx: async_channel::Receiver<i32>, id: u32) {
    while let Ok(item) = rx.recv().await {
        println!("{}: got {}:", id, item);
    }
}

#[tokio::main]
async fn main() {
    let (tx, rx) = async_channel::bounded(5000);

    tokio::spawn(receiver(rx.clone(), 2));
    tokio::spawn(receiver(rx.clone(), 3));

    for i in 0..10 {
        tx.send(i).await.expect("send failed");
        tokio::time::delay_for(tokio::time::Duration::from_millis(10)).await;
    }

    tokio::time::delay_for(tokio::time::Duration::new(300, 0)).await;
}

Here's a fix: https://github.com/stjepang/async-channel/pull/14/files

Anyone wants to try it out?

The fix is working for me and the spin loop is gone. Thank you for your quick response.

Looking good for both bounded and unbounded channels.

Fixed in v1.4.2