Traffic-agent sidecar does not appear when traffic-manager is installed in an istio-enabled namespace

Question

Traffic-agent sidecar does not appear when traffic-manager is installed in an istio-enabled namespace

alextricity25 opened this issue 4 months ago · comments

Describe the bug
Hey guys 👋. I have a pod which I am trying to intercept. It runs on an environment with Istio configured. Normally, this pod has two containers. The container running my app code, and the istio proxy sidecar. When I intercept the pod, I see that the traffic manager is spinning up a new pod, but the traffic agent container never appears. The pod should now have three pods, correct? 1. app 2. istio sidecar 3. traffic agent. However, I only see two pods running, and no traffic agent :/

I looked at the traffc-manager logs(running in debug mode), and these messages stick out to me:

traffic-manager 2024/03/29 20:59:57 http: TLS handshake error from 127.0.0.6:50859: EOF
traffic-manager 2024/03/29 20:59:57 http: TLS handshake error from 127.0.0.6:38917: EOF
traffic-manager 2024/03/29 20:59:58 http: TLS handshake error from 127.0.0.6:37491: EOF
traffic-manager 2024/03/29 21:00:02 http: TLS handshake error from 127.0.0.6:54201: EOF
traffic-manager 2024-03-29 21:00:03.1760 info    httpd/conn=127.0.0.1:8081 : Warning Unhealthy Readiness probe faile
d: HTTP probe failed with statuscode: 503 : session_id="2342f802-490b-48dd-944d-b3813e1a24d6"
traffic-manager 2024-03-29 21:00:04.4588 debug   podWatcher calling updateSubnets with [10.252.0.0/23]
traffic-manager 2024-03-29 21:00:05.1666 info    httpd/conn=127.0.0.1:8081 : Warning Unhealthy Readiness probe faile
d: HTTP probe failed with statuscode: 503 : session_id="2342f802-490b-48dd-944d-b3813e1a24d6"
traffic-manager 2024-03-29 21:00:07.1741 info    httpd/conn=127.0.0.1:8081 : Warning Unhealthy Readiness probe faile
d: HTTP probe failed with statuscode: 503 : session_id="2342f802-490b-48dd-944d-b3813e1a24d6"

I don't see anything else that may be glaring.

Some other things to note:

When I run the traffic manager outside my service mesh (so that the traffic manager doesn't have a istio sidecar), that works just fine, but my app runs extremely slow. So I am now trying to run telepresence in the same namespace as my app, which has the istio service mesh configured.

telepresence_logs.zip

To Reproduce
Steps to reproduce the behavior:

When I run telepresence intercept xrdm-portal --port 80:80 --docker-build ./devops/local-development --docker-build-opt file=./devops/local-development/Dockerfile.portal-web-watch -- --rm --name blah -e WATCH=true -v ./apps/portal:/app/ IMAGE
I see

Connected to context vcluster_vcluster-d4400caa_telepresence-04-29-03-vcluster_gke_xrdm-dev_us-central1_shared-review-cluster-7039273, namespace xrdm (https://telepresence-04-29-03-shared-cluster.xrdm.dev)
telepresence intercept: error: connector.CreateIntercept: request timed out while waiting for agent xrdm-portal.xrdm to arrive: Events that may be relevant:
AGE     TYPE      REASON      OBJECT                             MESSAGE
1m54s   Warning   Unhealthy   pod/xrdm-portal-58fb57497f-zsp7n   Readiness probe failed: HTTP probe failed with statuscode: 503
1m54s   Warning   Unhealthy   pod/xrdm-portal-58fb57497f-zsp7n   Readiness probe failed: HTTP probe failed with statuscode: 503
1m54s   Warning   Unhealthy   pod/xrdm-portal-58fb57497f-zsp7n   Readiness probe failed: HTTP probe failed with statuscode: 503

Inspect the traffic manager logs in the cluster
See error

Expected behavior
The pod running my app would run with three containers.

Istio proxy
my app code
traffic agent

Versions (please complete the following information):

Output of telepresence version

OSS Client             : v2.18.0
OSS Daemon in container: v2.18.0
Traffic Manager        : v2.18.0
Traffic Agent          : not reported by traffic-manager

Operating system of workstation running telepresence commands

macOS 14.3.1

Kubernetes environment and Version [e.g. Minikube, bare metal, Google Kubernetes Engine]

1.27.8-gke.1067004

Alex Cantu · Answer 1 · Mon Apr 01 2024 23:34:20 GMT+0800 (China Standard Time)

I've also tried this with the Trial Plan on v2.19.0

Client             : v2.19.0
Daemon in container: v2.19.0
Traffic Manager    : v2.19.0
Traffic Agent      : docker.io/datawire/ambassador-telepresence-agent:1.14.5

Alex Cantu · Answer 2 · Mon Apr 01 2024 23:56:36 GMT+0800 (China Standard Time)

It seems like running the traffic-manager with an istio-proxy sidecar is what causes this issue. When I remove the istio-proxy side car from the traffic-manager, then telepresence sets up the agent sidecar on my app pod successfully.

cindymullins-dw · Answer 3 · Tue Apr 02 2024 04:24:15 GMT+0800 (China Standard Time)

Hi @alextricity25, you’re getting a readiness probe error which is probably related to the Istio sidecar. Its interesting that it works outside the mesh, but once inside the mesh you start getting these probe failures. Have you configured this? If not, can you try it and see if it helps? It should help integrate the traffic manager into Istio.

Alex Cantu · Answer 4 · Tue Apr 02 2024 04:40:07 GMT+0800 (China Standard Time)

Hi @cindymullins-dw,

Thank you for your reply. Yes, I've configured the serviceMesh to be of type istio when installing the helm chart. My service also uses symbolic ports. Here are all of my helm chart's values for reference:

agent:
  image:
    name: ambassador-telepresence-agent
ambassador-agent:
  enabled: false
image:
  registry: docker.io/datawire
  tag: 2.19.0
systemaHost: app.getambassador.io
systemaPort: "443"
trafficManager:
  serviceMesh:
    type: istio

Thomas Hallgren · Answer 5 · Sun Apr 07 2024 14:09:42 GMT+0800 (China Standard Time)

@alextricity25 the logs you provided are client-only. As such, they don't tell us anything about what's going on with the traffic-agent. Would it be possible for you to include the logs from the traffic-manager and the failing pod? Also, using Helm value logLevel=debug while producing those logs would be helpful.

Thomas Hallgren · Answer 6 · Tue Apr 30 2024 20:39:15 GMT+0800 (China Standard Time)

Closing this due to lack of response.

Alex Cantu · Answer 7 · Tue Apr 30 2024 21:09:27 GMT+0800 (China Standard Time)

Apologies for not getting back to you with this. I haven't been able to spin up a new environment to test this yet, but once I get the chance I'll post the results here!