Reduce syscalls by batching requests and responses

Question

Reduce syscalls by batching requests and responses

ScottMansfield opened this issue 8 years ago · comments

Right now a large amount of CPU is spent on syscalls in a production deployment. This is partially because Rend does no batching of requests or responses. This issue is to create a new handlers.Handler implementation that would opportunistically batch requests to the backend and possibly another implementation of the common.Responder interface for the binary protocol that implements batching of responses.

Handler

The new handler implementation will maintain a pool of connections to the backend instead of the one connection per external connection method that has been used before. This provides a few benefits:

The complexity can be hidden behind the same straightforward Handler interface
A likely increase in throughput
Lower number of open file descriptors (though this hasn't been a problem yet)
More opportunities for batching requests through pooling

Unfortunately, the sync.Pool implementation clears the pooled objects on every garbage collection (a limitation which, if removed, would be helpful in many other places) so it cannot be used as a connection pool. The connections will need queues in order to batch requests a they come in. This leads to the list of drawbacks of this approach:

Much more complex code to manage concurrency
Latency will likely increase as a cost of having higher throughput

A concern is the management of he connection pool. It would be best if the connection pool could scale to the load instead of having a fixed size, though it would be easier to simply have a configuration value that fixes the size of the pool. The configuration option will take more operational effort to tune the connection pool size. If it's too large, we have the same problems as now with large amounts of syscall overhead. If it's too small, the connections will be the bottleneck. The simpler option will probably win out in this case, but I'll elaborate thoughts on the complex one anyway.

There's a couple ways to react to changes in load:

Internal tracking of queue sizes for the pooled connections. If the queues are too long (size) for too long (time) then we will need to add another connection
It's also possible to simply use a ratio of external connections to internal ones. This is dangerous because the external connections might be mostly dormant and the overhead comes back.

I think that the best way is to dynamically track queue length and use a high-water-mark size. In this way, it does not scale down. This is better because the lower load times will get better latency but the higher load times no longer need the time to scale up connections. A use case that this helps is spiky loads.

Responder

This is a little more tricky because there may be many connections doing little traffic, in which case batching would not help. In the case where there are many requests coming from a single connection, batching the responses will also provide a benefit in the number of required syscalls.

An implementation could be to batch if the connection is doing a lot of requests, and not batch if it is not doing many. This would require the responder to keep track of how many responses per second it gets. An easy implementation of this would be to only flush every nth response out of the buffered reader if the connection does a lot of traffic.