owais / istio-grpc-headless-test

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Demonstrate gRPC + headless service issues with istio

issue: istio/istio#49391

Steps to reproduce

  1. Deploy server components kubectl apply -f server.yaml
  2. Switch to istio-test namespace. kubens istio-test
  3. Wait for all three server pods to become healthy
step-1
  1. Deploy client components kubectl apply -f client.yaml
  2. Observe client is able to send messages to the server pods with kubectl logs -f -l=component=client
step-5
  1. Scale down server pods with kubectl scale sts istio-grpc-test-server --replicas 0
  2. Observe errors in client pod logs
step-7
  1. Look at istio proxy endpoints. Most of the time it'll still list old server pod IPs as healthy. istioctl proxy-config endpoints <client-pod-name> | grep server
step-8
  1. Scale up server pods back to three replicas and Wait for new server pods to come up and become healthy and take note of the new IPs. kubectl scale sts istio-grpc-test-server --replicas 3
step-9
  1. Describe service to confirm k8s service has updated its endpoints to the new pod IPs
step-10
  1. List client proxy endpoints again as in step 8 and notice they are still pointing to old IPs
step-11
  1. Look at client pods logs again and confirm that the errors have not resolved even though replacement server pods are up and healthy
step-12

At this point pretty much the only way to recover the service is to restart the client pod so it gets the new server IPs. Sometimes making some changes to the k8s service resource or related istio resources also triggers something and updates client endpoints to the new IPs but this does not work reliably.

Deployment have the same issue

I've deployed the servers as a statefulset as that is the closest setup to my real world scenario but it doesn't really matter. I've been able to reproduce it with deployment as well as long as the pods are exposed with a headless service (clusterIP: None).

accessing individual pods vs service behaves the same

It doesn't matter whether the client tries to connect to the headless service (istio-grpc-test-server.istio-test.svc.cluster.local) or a specific pod (istio-grpc-test-server-0.istio-grpc-test-server.istio-test.svc.cluster.local). Both cases behave exactly the same.

What works

Headful services

When using a headful service instead of headless, none of this happensa and the istio proxy is able to discover new endpoints as soon as they become healthy. It also removes the old endpoints as soon as server pods are deleted. This allows the client to recover once server pods become available again.

It doesn't matter whether the server pods are from a stateful or a deployment. As long as they are exposed via a headless service (clientIP: None), I can reliably reproduce the issue.

Marking service port as TCP

This only happens when the service port app protocol is set to gRPC by prefixing the port name with grpc-. If the port name is prefixed with tcp-, everything works as expected even with a headless service.

About


Languages

Language:Go 69.1%Language:Makefile 18.5%Language:Dockerfile 12.4%