etcdv3 / etcd-client

An etcd v3 API client

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tolerate (partial) connection failures in endpoints in the Balancer Client

rrichardson opened this issue · comments

Motivation:

We connect to a quorum of etcd servers across regions (not the recommended architecture, but it works quite well)

For various reasons, a small subset of the nodes might be unavailable.
This should instead tolerate failures and adjust the pool accordingly, if that is the desire of the consumer of the API.

This functionality lives in the tower::balancer and tonic::transport::service behavior. The discovery mechanism in balancer_channel connects "lazily" upon receiving its requests. It appears to connect to all endpoints, but if one fails, the entire operation fails.

It seems like the only option here is to work with the Tower team to provide a partial success route. This is preferred not only because it is the right thing for initial connection, but should provide the proper behavior in an ongoing fashion.

I will continue to pursue this approach, but I'd like to leave this ticket open because there will likely be some (hopefully non-breaking) changes to the etcd client to optionally utilize the partial-success behavior.

ditto~ And it's an important issue I think.