tokio-rs / tokio-uring

An io_uring backed runtime for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi_thread support

Noah-Kennedy opened this issue · comments

Tracking issue for support for multi_thread runtime.

Is there any limitation in implementing this that doesn't let us just use the multithreaded equivalents of the currently used types in the library? (e.g., Rc => Arc, RefCell => Mutex, etc.)

commented

@Alonely0 Its possible to configure io_uring such that it invalid for a submission to come from more than 1 thread. I suspect there are other things to consider also.

tokio_uring::builder().uring_builder(
   tokio_uring::uring_builder().setup_single_issuer()
)

I think multi-thread support is essential

I use tokio to write a VPN program, when using rt-multi-threaded feature and tokio::spawn, in the test environment, iperf3 TCP can reach 600Mbps, however when using tokio_uring to handle IO, the single thread tokio_uring only have 120Mbps, and 1 CPU usage reach 100% on the iperf3 server

tokio
image

tokio_uring
image

I have set sqpoll to try to reduce the submit_and_wait syscall

I notice the Op has a weak reference of the TLS RuntimeContext, and when polling the future, it will check the lifecycle in Ops, and the Ops is in the RuntimeContext

if Op can have an arc weak reference of the RuntimeContext, no matter which thread the RuntimeContext is in, that may make multi-thread support easily

  • when creating an Op, it uses the current thread RuntimeContext to submit the sqe
  • when polling the future, Op will use the arc weak reference to find out the RuntimeContext to check if work is done or not, no matter Op is on the same thread when submitting the sqe or not
  • we can create multi threads to run the io_uring instances, so we can use more CPU

So, in general the easiest path to doing this is going to be to do something similar to what @Sherlock-Holo described, however it isn't clear to me that this will perform terribly well. Contention on the squeue may be an issue, so there would be a bit of "wait and see" with respects to what means of doing this ultimately stick.

For now I'd recommend a runtime-per-core model of some sort. Depending on what you are doing, that is probably going to work quite well.