RPC connection idleness timeout should be independent of request timeout

Question

RPC connection idleness timeout should be independent of request timeout

nacl opened this issue 5 years ago · comments

Is your feature request related to a problem? Please describe.

I've noticed that SMF connections time out when left alone for a while. What I've also noticed is that RPCs also time out at the same rate, which provides little options to support having long-lived persistent connections but with having relatively short request timeouts.

They are both currently configured using this value in rpc_client_opts (rpc_client.h):

  /// \ brief The default timeout PER connection body. After we
  /// parse the header of the connection we need to
  /// make sure that we at some point receive some
  /// bytes or expire the connection. Effectively infinite
  typename seastar::timer<>::duration recv_timeout = std::chrono::minutes(1);

Which maps to this in rpc_connection_limits (rpc_connection_limits.h):

 const timer_duration_t max_body_parsing_duration;

I'm not sure if separating these viable, because, at least according to my limited understanding, an RPC slot is effectively lost forever if an RPC never returns.

Describe the solution you'd like
Some way to independently configure connection and RPC timeouts.

Describe alternatives you've considered

None of these are really satisfactory IMO, as they add additional burden to the client software:

Something that would periodically check if the connection is alive, and reopen it if it isn't.

This would block requests waiting for a TCP connection to come up, and force complexity upon requests that are waiting for the RPC connection to be restarted.
Reopening the connection when needed upon closure.

This has the same problems as the previous one.

This one is another possibility, but it may not be easy to implement:

A periodic "heartbeat" sent from the client to the server.

This wouldn't really solve the problem without a way to configure connection closure timeouts, although that burden could be pushed onto the developer using the library.

As it currently stands (and as I understand), this would require a custom client and server RPC. This (AFAICT) would need to be done for every RPC connection to keep it active.

Thoughts on this would be greatly appreciated; this is by no means a trivial problem.

Alexander Gallego · Answer 1 · Wed Mar 13 2019 10:10:37 GMT+0800 (China Standard Time)

@nacl have you considered playing w/ these:

https://github.com/smfrpc/smf/blob/master/src/include/smf/rpc_server_connection.h#L15-L61

Alexander Gallego · Answer 2 · Wed Mar 13 2019 10:20:40 GMT+0800 (China Standard Time)

I think if you conffigure the above settings with:

std::numeric_limits<int>::max() for count of keepalives and both keep alives & probes to be < client timeout. it should work out.

I'm pretty sure this is supposed to be handled at the TCP level tho.

Now, there is an interesting set of functions that you can write that are generic for example,

// in pseudocode
template<typename Func>
auto do_with_retries(Func f, int retry_count = 3) -> decltype(std::result_of_t(f())) {
             return seastar::repeat_until( [ ] { bool condition },
                                          std::forward<Func>(f) ):
}

So we can have a library of RPC::Client specific drivers. One for retries, one for auto reconnect, one for every form of recoverable failure.

They become reusable components applicable to all drivers.

That way you get function composition.

return smf::with_retries( smf_reconnect ( [ ] lambda ), 3 /*retries*/);

Alexander Gallego · Answer 3 · Wed Apr 17 2019 08:17:34 GMT+0800 (China Standard Time)

@nacl let me know if #350 comment makes sense.