[Optimization] Provide option to reuse Netty's selectors to handle requests

Question

[Optimization] Provide option to reuse Netty's selectors to handle requests

fwbrasil opened this issue 3 months ago · comments

Tapir version: 1.9.11

Scala version: 3.3.3

Describe the bug

A common strategy to boost the performance of HTTP servers in benchmarks is to reuse Netty's event loop selectors to handle requests directly in the thread that receives the payload. This avoids the overhead of switching the execution to another thread and, once the request finishes processing, writing back to the channel is cheaper. If another thread handles the request, Netty also has to use a more expensive mechanism to pass the response back to the selector.

This mode of execution is something that makes sense only for extremely low-latency endpoints with very little CPU usage, which is common for benchmarks. In real production workloads, it's generally better to ensure the event loop is always available to handle payloads otherwise, there can be a large latency cost. We had remarkable latency reductions at Twitter by just switching the request handling to another thread pool. The only services that didn't perform well were the ones that did simple in-memory lookups.

How to reproduce?

I've added an option to the Tapir integration in Kyo to disable the forking. It's possible to check the performance by setting it on/off: https://github.com/getkyo/kyo/blob/main/kyo-tapir/src/main/scala/kyo/server/NettyKyoServerOptions.scala#L17

Krzysztof Ciesielski · Answer 1 · Fri Mar 08 2024 15:54:12 GMT+0800 (China Standard Time)

Thanks, I wasn't aware of this possibility. It sounds like something that we might want to configure on the endpoint level instead of having a global forkExecution setting. This would require changes in a few places to properly propagate it, wdyt?

Flavio Brasil · Answer 2 · Sun Mar 10 2024 08:42:13 GMT+0800 (China Standard Time)

Good point! I normally see this kind of configuration globally for a server but having endpoint-level control of the behavior could be very useful since typically only a few endpoints can benefit of the optimization in production.