telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster

Home Page:https://www.telepresence.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Connecting in Windows WSL with --docker option does not succeed

venelin-rangelov opened this issue · comments

We have a business account and I'm using Windows 11 with WSL 2 Ubuntu distribution(latest version)
Telepresence is up to date being v2.19.5

I do first telepresence login and log in in the browser to be sure, additionally connected to my cluster in AWS.

The problem is strange because when i go in WSL and do telepresence connect - that connects me, but if i do telepresence connect --docker that gives the error:
telepresence connect: error: connector.Connect: initial cluster check failed: Get "https://xxxxxxxxx.gr7.us-east-2.eks.amazonaws.com/version": getting credentials: exec: executable telepresence failed with exit code 1

logs shows this:

2024-05-27 10:10:03.0912 info    ---
2024-05-27 10:10:03.0912 info    Telepresence Connector v2.19.5 (api v3) starting...
2024-05-27 10:10:03.0912 info    PID is 1
2024-05-27 10:10:03.0912 info
2024-05-27 10:10:03.0929 info    docker/server-grpc : gRPC server started
2024-05-27 10:10:03.1259 info    docker/dev-dev13-cn : -- Starting new session
2024-05-27 10:10:03.1260 info    docker/dev-dev13-cn : Connecting to k8s cluster...
telepresence kubeauth: error: failed to get exec credentials: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 192.168.65.254:46293: connect: connection refused"
2024-05-27 10:10:03.1641 error   docker/dev-dev13-cn : unable to track k8s cluster: initial cluster check failed: Get "https://75aaa9fb4a7214d734c488b377f53f53.gr7.us-east-2.eks.amazonaws.com/version": getting credentials: exec: executable telepresence failed with exit code 1
2024-05-27 10:10:03.1642 info    docker:shutdown_logger : shutting down (gracefully)...
2024-05-27 10:10:03.1668 info    docker/dev-dev13-cn:shutdown_logger : shutting down (gracefully)...

Same actions done in terminal in the windows itself does connect.
Additionally did checked that my resolve.conf has the dns 8.8.8.8 added so i have connection to outside within wsl

Expected behavior
I should be able to connect both in WSL and Windows itself with --docker option

I investigated this further. Here's what I believe happens:

The kubeconfig contains an exec type authentication appointing a binary. Something similar to:

    exec:
      apiVersion: client.authentication.k8s.io/v1
      command: aws
      args:
        - --region
        - <some region>
        - eks
        ...

When using --docker, telepresence must modify this kubeconfig because the aws command is not available in the container where the port-forward to the cluster is established. So before connecting, telepresence starts a k8sauth daemon process outside of docker that listens to a random TCP port (46293 in the error). This process will eventually make the aws eks call described in the exec config to retrieve the credentials. Telepresence then modifies the kubeconfig to execute telepresence kubeauth <IP of Docker host>:<random TCP port>. The kubeauth subcommand connects to the port and sends a gRPC call to the kubeauth process executing outside of docker, and receives the credentials in return.

This works well in all situations except when the initial call to telepresence connect --docker is made from a host that is different from the Docker host, and that is exactly what happens when the call is made from a Linux started by WSL, because the Docker host in this case, is the Windows host, not the Linux host. Consequently, the telepresence kubeauth <IP of Windows host>:<random TCP port> fails, because the kubeauth process that it tries to contact runs on the Linux host.

The solution that I'm working on, will ensure that the host making the telepresence connect --docker is used at all times.