multi_thread support

Question

multi_thread support

Noah-Kennedy opened this issue a year ago · comments

Tracking issue for support for multi_thread runtime.

Guillem L. Jara · Answer 1 · Sat Apr 01 2023 03:15:39 GMT+0800 (China Standard Time)

Is there any limitation in implementing this that doesn't let us just use the multithreaded equivalents of the currently used types in the library? (e.g., Rc => Arc, RefCell => Mutex, etc.)

ollie · Answer 2 · Sun Apr 02 2023 17:40:52 GMT+0800 (China Standard Time)

@Alonely0 Its possible to configure io_uring such that it invalid for a submission to come from more than 1 thread. I suspect there are other things to consider also.

tokio_uring::builder().uring_builder(
   tokio_uring::uring_builder().setup_single_issuer()
)

Sherlock Holo · Answer 3 · Thu May 25 2023 16:48:12 GMT+0800 (China Standard Time)

I think multi-thread support is essential

I use tokio to write a VPN program, when using rt-multi-threaded feature and tokio::spawn, in the test environment, iperf3 TCP can reach 600Mbps, however when using tokio_uring to handle IO, the single thread tokio_uring only have 120Mbps, and 1 CPU usage reach 100% on the iperf3 server

tokio

tokio_uring

I have set sqpoll to try to reduce the submit_and_wait syscall

Sherlock Holo · Answer 4 · Thu May 25 2023 18:08:15 GMT+0800 (China Standard Time)

I notice the Op has a weak reference of the TLS RuntimeContext, and when polling the future, it will check the lifecycle in Ops, and the Ops is in the RuntimeContext

if Op can have an arc weak reference of the RuntimeContext, no matter which thread the RuntimeContext is in, that may make multi-thread support easily

when creating an Op, it uses the current thread RuntimeContext to submit the sqe
when polling the future, Op will use the arc weak reference to find out the RuntimeContext to check if work is done or not, no matter Op is on the same thread when submitting the sqe or not
we can create multi threads to run the io_uring instances, so we can use more CPU

Noah Kennedy · Answer 5 · Fri May 26 2023 04:22:23 GMT+0800 (China Standard Time)

So, in general the easiest path to doing this is going to be to do something similar to what @Sherlock-Holo described, however it isn't clear to me that this will perform terribly well. Contention on the squeue may be an issue, so there would be a bit of "wait and see" with respects to what means of doing this ultimately stick.

For now I'd recommend a runtime-per-core model of some sort. Depending on what you are doing, that is probably going to work quite well.