smallrye / smallrye-stork

SmallRye Stork is a service discovery and client side-load balancing framework.

Home Page:http://smallrye.io/smallrye-stork/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

refresh-period not triggering any periodic discovery

weand opened this issue · comments

Hi,

we use Stork on Quarkus with Kubernetes Service Discovery.

It turns out, that Stork does not consider refresh-period properly, when using a configuration like

quarkus.grpc.clients.fooService.host=foo
quarkus.grpc.clients.fooService.port=9000
quarkus.grpc.clients.fooService.name-resolver=stork


quarkus.stork.foo.service-discovery.type=kubernetes
quarkus.stork.foo.service-discovery.k8s-namespace=${jkube.namespace}
quarkus.stork.foo.service-discovery.application=foo-service-grpc
quarkus.stork.foo.service-discovery.refresh-period=5s

We debugged the application remotely in k8s, and see that
KubernetesServiceDiscovery.fetchNewServiceInstances(List previousInstances) is not invoked regulary. Even not when having traffic on the application.

We expected that new services get discovered (after refresh-period) automatically, e.g. when scaling up a client pod, or restarting a pod. But thats obviously not the case.

Latest Quarkus Release uses Stork 1.1.2, but there don't seem any change on CachingServiceDiscovery in Stork 1.2.0.

Any advice ?

Best regards
Andreas

Hi Andreas,
Thanks for reaching out, it should refresh the cache after the refreshing period, do you have a reproducer?

I don't see anything wrong in the config.

I've got nothing ready for github.

But we basically follow this approach:

Inject a channel object as described here

@GrpcClient("fooService")
Channel channel;

Then on application start we create one async stub on the channel:

FooServiceStub fooService = FooServiceGrpc.newStub(channel)

The boundary of the application contains an JAX-RS service. On every REST call, we have some bidirectional communication on the stub:

fooService.someMethod(responseStreamObserver)

Ok, can you confirm that the problem is about the refreshing of the cache? I mean that the service instances are collected the first time and cache is gathered with them or it is never fetching the instances from the cluster?

I will try to set up an scenario equivalent in order to see if there is a problem but we have not detected anything recently. Can you please confirm the Quarkus and Stork versions used?

I'll be off to an event 2 days next week so you can expect some delay in my investigations.

If you have access to the cluster, Could you please verify that the following command returns the instances that you expect Stork to discover?
kubectl get endpoints foo-service-grpc -n ${jkube.namespace}

Ok, can you confirm that the problem is about the refreshing of the cache? I mean that the service instances are collected the first time and cache is gathered with them or it is never fetching the instances from the cluster?

Yes, its about refreshing the caches view, e.g. after scaling up pods. So basic reproducer is:

  • grpc service started with 2 pods initially
  • start application with grpc client: it load balances properly (in round robin manner) between both service instances
  • scale grpc service up to 3 pods total: client won't ever call the new pod.
# service with 2 pods started
$ oc get endpoints foo-service-grpc
NAME                               ENDPOINTS                             AGE
foo-service-grpc   10.128.4.110:9000,10.130.4.248:9000   6m6s

# scale up to 3 pods
$ oc get endpoints foo-service-grpc
NAME                               ENDPOINTS                                               AGE
foo-service-grpc   10.128.4.110:9000,10.130.4.248:9000,10.131.2.171:9000   8m46s

After recent PR #374 we hoped for some improvement, but the issue remains. Tested with latest Quarkus 2.14.2.

Any workaround idea for how to get service instances discovered properly after scaling up ?

Thanks in advance.

Similar issue with Quarkus 3.0.4.

When I deploy a new version, new pods are created and the old ones destroyed. The client application detects the new pods but then use the old ones again.

  • Start the client, it connects to pod 2 and 3.
  • Deploy the new version of the gRPC service, pod 4 and 5 are created. Pod 2 and 3 are destroyed.

Here the logs on the client side (logs are old, with Quarkus v2 but same issue with Quarkus v3) :

stork

Hi, sorry for the delay, I haven't been able to take a look on this because of traveling, I will do later on this week and try to provide a response.

Hi here! I finally was able to reproduce this issue. I will do a fix beginning of next week.

Just to let you know that it also fixes the similar bug I had using Quarkus v3.2.1

Thanks a lot for the fix!