watch without waitForReady sometimes not reschedule when all servers are down
tunefun opened this issue · comments
Versions
- etcd: 3.5.8
- jetcd: 0.7.6
- java: 1.8
Describe the bug
Watch without waitForReady sometimes not reschedule when all servers are down. When we reboot part of servers, watch stream on those servers rescheduled as expected;When we shutdown all servers and reboot all servers after a while, watch not rescheduled as expected.
Review code below, we will loss the event when event comes before setting handler.
jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/WatchImpl.java
Lines 180 to 188 in c51adfb
example:
connect failed -> StreamObserverReadStream.onError() -> WatcherImpl.onError() -> WatcherImpl.reschedule() -> WatcherImpl.resume() -> WatchVertxStub.watch() -> connect failed -> StreamObserverReadStream.onError() -> StreamObserverReadStream.exceptionHandler is null -> rstream.exceptionHandler(this::onError)
To Reproduce
as description
Expected behavior
watch rescheduled
@tenghuanhe can you provide a reproducer and/or willing to work on a PR ?
We are also facing same issue.I am not able to reproduce using test containers.
This is what is happening.
For creating watch io.vertx.grpc.stub.ClientCalls.manyToMany
is called which returns rstream and after that we register exceptionHandler but in this case exception happens before rstream is returned thats why StreamObserverReadStream.exceptionHandler is null.So it not rescheduling(onError is not called).
Exception occurs in StreamObserver<I> request = delegate.apply(response);
public static <I, O> ReadStream<O> manyToMany(ContextInternal ctx, Handler<WriteStream<I>> requestHandler, Function<StreamObserver<O>, StreamObserver<I>> delegate) {
StreamObserverReadStream<O> response = new StreamObserverReadStream<>();
StreamObserver<I> request = delegate.apply(response);
requestHandler.handle(new GrpcWriteStream<>(request));
return response;
}
@lburgazzoli I am not sure what will be the fix?
@giri-vsr I do have some very limited time at this stage and I need to digg into the issue more to understand what to do
If you have any time, maybe it would be good to chat with the vert.x folks to see what a solution could be.
Maybe we need to move to https://github.com/eclipse-vertx/vertx-grpc since https://github.com/vert-x3/vertx-grpc is deprecated
Maybe we need to move to https://github.com/eclipse-vertx/vertx-grpc since https://github.com/vert-x3/vertx-grpc is deprecated
can you try to work on a PR ?
Issue is fixed in https://github.com/vert-x3/vertx-grpc 4.5.1 and Move to https://github.com/eclipse-vertx/vertx-grpc should be handled separately.
@lburgazzoli @giri-vsr hello, it seems that election observe have the same problem
jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/ElectionImpl.java
Lines 116 to 119 in 85a9a54
and maybe we can also registers end handler before watch request?
also LeaseImpl,MaintenanceImpl:
jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/LeaseImpl.java
Lines 150 to 163 in 85a9a54
jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/LeaseImpl.java
Lines 197 to 199 in 85a9a54
jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/MaintenanceImpl.java
Lines 122 to 136 in 85a9a54
jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/MaintenanceImpl.java
Lines 144 to 147 in 85a9a54