etcd-io / jetcd

Versions

etcd: 3.5.8
jetcd: 0.7.6
java: 1.8

Describe the bug
Watch without waitForReady sometimes not reschedule when all servers are down. When we reboot part of servers, watch stream on those servers rescheduled as expected;When we shutdown all servers and reboot all servers after a while, watch not rescheduled as expected.

Review code below, we will loss the event when event comes before setting handler.

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/WatchImpl.java

Lines 180 to 188 in c51adfb

    
           rstream = Util.applyRequireLeader(option.withRequireLeader(), stub).watch(stream -> { 
        
               wstream.set(stream); 
        
               stream.write(WatchRequest.newBuilder().setCreateRequest(builder).build()); 
        
           }); 
        
           rstream.handler(this::onNext); 
        
           rstream.exceptionHandler(this::onError); 
        
           rstream.endHandler(event -> onCompleted());

example:
connect failed -> StreamObserverReadStream.onError() -> WatcherImpl.onError() -> WatcherImpl.reschedule() -> WatcherImpl.resume() -> WatchVertxStub.watch() -> connect failed -> StreamObserverReadStream.onError() -> StreamObserverReadStream.exceptionHandler is null -> rstream.exceptionHandler(this::onError)

To Reproduce
as description

Expected behavior
watch rescheduled

@tenghuanhe can you provide a reproducer and/or willing to work on a PR ?

We are also facing same issue.I am not able to reproduce using test containers.

This is what is happening.
For creating watch io.vertx.grpc.stub.ClientCalls.manyToMany is called which returns rstream and after that we register exceptionHandler but in this case exception happens before rstream is returned thats why StreamObserverReadStream.exceptionHandler is null.So it not rescheduling(onError is not called).

Exception occurs in StreamObserver<I> request = delegate.apply(response);

  public static <I, O> ReadStream<O> manyToMany(ContextInternal ctx, Handler<WriteStream<I>> requestHandler, Function<StreamObserver<O>, StreamObserver<I>> delegate) {
    StreamObserverReadStream<O> response = new StreamObserverReadStream<>();
    StreamObserver<I> request = delegate.apply(response);
    requestHandler.handle(new GrpcWriteStream<>(request));
    return response;
  }

@lburgazzoli I am not sure what will be the fix?

@giri-vsr I do have some very limited time at this stage and I need to digg into the issue more to understand what to do
If you have any time, maybe it would be good to chat with the vert.x folks to see what a solution could be.

Maybe we need to move to https://github.com/eclipse-vertx/vertx-grpc since https://github.com/vert-x3/vertx-grpc is deprecated

Maybe we need to move to https://github.com/eclipse-vertx/vertx-grpc since https://github.com/vert-x3/vertx-grpc is deprecated

can you try to work on a PR ?

Issue is fixed in https://github.com/vert-x3/vertx-grpc 4.5.1 and Move to https://github.com/eclipse-vertx/vertx-grpc should be handled separately.

@lburgazzoli @giri-vsr hello, it seems that election observe have the same problem

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/ElectionImpl.java

Lines 116 to 119 in 85a9a54

    
           stub.observe(request) 
        
               .handler(value -> listener.onNext(new LeaderResponse(value, namespace))) 
        
               .endHandler(ignored -> listener.onCompleted()) 
        
               .exceptionHandler(error -> listener.onError(toEtcdException(error)));

and maybe we can also registers end handler before watch request？

also LeaseImpl,MaintenanceImpl：

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/LeaseImpl.java

Lines 150 to 163 in 85a9a54

    
           leaseStub 
        
               .leaseKeepAlive(s -> { 
        
                   ref.set(s); 
        
                   s.write(req); 
        
               }) 
        
               .handler(r -> { 
        
                   if (r.getTTL() != 0) { 
        
                       future.complete(new LeaseKeepAliveResponse(r)); 
        
                   } else { 
        
                       future.completeExceptionally( 
        
                           newEtcdException(ErrorCode.NOT_FOUND, "etcdserver: requested lease not found")); 
        
                   } 
        
               }) 
        
               .exceptionHandler(future::completeExceptionally);

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/LeaseImpl.java

Lines 197 to 199 in 85a9a54

    
           leaseStub.leaseKeepAlive(this::writeHandler) 
        
               .handler(this::handleResponse) 
        
               .exceptionHandler(this::handleException);

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/MaintenanceImpl.java

Lines 122 to 136 in 85a9a54

    
           this.stub.snapshot(SnapshotRequest.getDefaultInstance()) 
        
               .handler(r -> { 
        
                   try { 
        
                       r.getBlob().writeTo(outputStream); 
        
                       bytes.addAndGet(r.getBlob().size()); 
        
                   } catch (IOException e) { 
        
                       answer.completeExceptionally(toEtcdException(e)); 
        
                   } 
        
               }) 
        
               .endHandler(event -> { 
        
                   answer.complete(bytes.get()); 
        
               }) 
        
               .exceptionHandler(e -> { 
        
                   answer.completeExceptionally(toEtcdException(e)); 
        
               });

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/MaintenanceImpl.java

Lines 144 to 147 in 85a9a54

    
           this.stub.snapshot(SnapshotRequest.getDefaultInstance()) 
        
               .handler(r -> observer.onNext(new io.etcd.jetcd.maintenance.SnapshotResponse(r))) 
        
               .endHandler(event -> observer.onCompleted()) 
        
               .exceptionHandler(e -> observer.onError(toEtcdException(e)));


	rstream = Util.applyRequireLeader(option.withRequireLeader(), stub).watch(stream -> {
	wstream.set(stream);
	stream.write(WatchRequest.newBuilder().setCreateRequest(builder).build());
	});

	rstream.handler(this::onNext);
	rstream.exceptionHandler(this::onError);
	rstream.endHandler(event -> onCompleted());

	stub.observe(request)
	.handler(value -> listener.onNext(new LeaderResponse(value, namespace)))
	.endHandler(ignored -> listener.onCompleted())
	.exceptionHandler(error -> listener.onError(toEtcdException(error)));

	leaseStub
	.leaseKeepAlive(s -> {
	ref.set(s);
	s.write(req);
	})
	.handler(r -> {
	if (r.getTTL() != 0) {
	future.complete(new LeaseKeepAliveResponse(r));
	} else {
	future.completeExceptionally(
	newEtcdException(ErrorCode.NOT_FOUND, "etcdserver: requested lease not found"));
	}
	})
	.exceptionHandler(future::completeExceptionally);

	leaseStub.leaseKeepAlive(this::writeHandler)
	.handler(this::handleResponse)
	.exceptionHandler(this::handleException);

	this.stub.snapshot(SnapshotRequest.getDefaultInstance())
	.handler(r -> {
	try {
	r.getBlob().writeTo(outputStream);
	bytes.addAndGet(r.getBlob().size());
	} catch (IOException e) {
	answer.completeExceptionally(toEtcdException(e));
	}
	})
	.endHandler(event -> {
	answer.complete(bytes.get());
	})
	.exceptionHandler(e -> {
	answer.completeExceptionally(toEtcdException(e));
	});

	this.stub.snapshot(SnapshotRequest.getDefaultInstance())
	.handler(r -> observer.onNext(new io.etcd.jetcd.maintenance.SnapshotResponse(r)))
	.endHandler(event -> observer.onCompleted())
	.exceptionHandler(e -> observer.onError(toEtcdException(e)));

watch without waitForReady sometimes not reschedule when all servers are down