Connection reconnect on disconnect

Question

Connection reconnect on disconnect

Lwelch45 opened this issue 6 years ago · comments

Is your feature request related to a problem? Please describe.
Should smf handle reconnects underneath the hood and what would that look like

Describe the solution you'd like
I thought about this more after leaving the comment: #289 (comment) If we can better surface he errors related to clients not being connected to the server then it might be better to leave it in the hands of the user to reconnect as is the standard contract in other projects.
#289 (comment) Would seem to be the right thing to do here?

Describe alternatives you've considered
client.register_outgoing_filter < smf::retry_tcp_conn<3>>() the concern here is that the filters are rightfully unaware of anything related to the connection and having to propagate it into the filters would add extra complexity to the filters and potentially serve as a choke point?

Additional context
[Off topic] You mentioned that there are other groups/folks are working on a raft implementation on top of smf. Is any of it public/anyone I can connect with?

Alexander Gallego · Answer 1 · Wed Nov 28 2018 12:34:35 GMT+0800 (China Standard Time)

I've been fixing backpressure throughout with a bit more test coverage.

I added a method to the main api that does i think what you want.

// ....... snip

 SMF_ALWAYS_INLINE virtual bool
  is_conn_valid() const final {
    return conn_ && conn_->is_valid();
  }

// ......... snip

seastar::future<>
rpc_client::reconnect() {
  fail_outstanding_futures();
  return stop().then([this] {
    conn_ = nullptr;
    return connect();
  });
}

that way you can get the usage you want

auto client = seastar::make_shared<SmfStorageDemoClient>( seastar::ip_address_v4{ ... } );

... snip
client->Get( data_request )
       .then( ... )
       .handle_exception([ this ] (auto e){
          LOG_INFO("Handling exception by retry: {}",e);
          return client->reconnect().then([ client ] { try_again( counter ++ ); });
    });

I think we need a different mechanism for these higher level goals that are not at the filter level yet.

Alexander Gallego · Answer 2 · Wed Nov 28 2018 12:36:40 GMT+0800 (China Standard Time)

@Lwelch45 w.r.t raft. I can ask. I'm not sure they are OSS'ing their impl sadly.

Alexander Gallego · Answer 3 · Wed Nov 28 2018 12:44:03 GMT+0800 (China Standard Time)

@Lwelch45 - histryx is more like the api i envision on top of this.

https://medium.com/@NetflixTechBlog/performance-under-load-3e6fa9a60581

Note that we already have many of these concurrency backpressures in place at no cost.

Memory backpressure (server and client so you don't OOM)
Timeout backpressure (configurable per connection)
Socket exclusive backpressure - only one fiber flushes at a time per socket for correct serialization of header + payload
Gates - we can block/unfulfill future until all previously sent futures either come back or get canceled/exceptioned

What we are missing are high level strategies - i.e.: circuit breaker, exponential backoff, etc.

We already have a filter api - see filter.h and a way to apply them.

i.e.: take a T and return a T

template <T>
filter {
   seastar::future<T> operator()( T && t) {
};

the last few things is what I'm thinking through.

Laurence Welch · Answer 4 · Wed Nov 28 2018 13:16:19 GMT+0800 (China Standard Time)

Those two methods along side the explicit failure in the returned future will be more than enough to get me going on what im working on for now. As I get deeper into it ill try to think of some solutions to some of the higher level strategies.

Alexander Gallego · Answer 5 · Wed Nov 28 2018 13:33:55 GMT+0800 (China Standard Time)

Sounds good. I'm going to merge #289 tomorrow. Feel free to give that branch a shot. It works and is better tested. It has a minor breaking change.

before you'd do SmfStorageClient::make_shared(opts..) now you do auto client = seastar::make_shared<SmfStorageClient>(opts..) - we save a few bytes, remove one extra inheritance class, allow for reconnects and correctly handle backpressure.

Alexander Gallego · Answer 6 · Thu Nov 29 2018 00:07:38 GMT+0800 (China Standard Time)

merged in #289 and #284