If there is only one connection, what's the best practice of taking full advantage of multiple cores

Question

If there is only one connection, what's the best practice of taking full advantage of multiple cores

jiangdongzi opened this issue 9 months ago · comments

jiangdongzi commented 9 months ago

Bob Chen · Answer 1 · Thu Oct 26 2023 13:06:23 GMT+0800 (China Standard Time)

Use thread_migrate or WorkPool async_call to dispatch your tasks to different vCPU, waiting tasks to be done.
Either switch back to the previous thread that serves the connection, or write back response directly in the current thread, with locks.

jiangdongzi · Answer 2 · Thu Oct 26 2023 18:40:19 GMT+0800 (China Standard Time)

Use thread_migrate or WorkPool async_call to dispatch your tasks to different vCPU, waiting tasks to be done.

Either switch back to the previous thread that serves the connection, or write back response directly in the current thread, with locks.

I read the souce code and find Photon just use one queue among the pool std::threads, the competition may be very fierce.
Although photon use lock free method, but the atomic value may change very frequently, thus making cpu cache miss frequently. I have tested mutithread operating one atomic value, the cost is too expensive, ~200 cycles. I think work stealing is a better method, like golang.

Bob Chen · Answer 3 · Thu Oct 26 2023 18:55:47 GMT+0800 (China Standard Time)

There had been many discussions on work stealing, but the fact is that it's still under development and at early stage. It'll be appreciated if you can contribute.

Huiba Li · Answer 4 · Fri Oct 27 2023 09:51:59 GMT+0800 (China Standard Time)

I think work stealing is a better method, like golang.

thread_migrate is as efficient as work-stealing. You may migrate one to a random vCPU in a pool.

Huiba Li · Answer 5 · Fri Oct 27 2023 10:01:29 GMT+0800 (China Standard Time)

thread_migrate is as efficient as work-stealing.

No, it should be more efficient than work-stealing. In this mode, lock contention occurs between the dispatching vCPU and each worker vCPU respectively. Whereas in work-stealing, lock contention occurs among all of the vCPU together, in order for the workers to steal from the run queue of dispatching vCPU.

That is to say, work-stealing should perform similar to a lock-free ring queue in this scenario. They both have a queue shared and competed by all vCPU.