cloudflare / cloudflared

Cloudflare Tunnel client (formerly Argo Tunnel)

Home Page:https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/tunnel-guide

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

📝 Using cloudflared container on AWS EKS to access VPC via WARP

dsalaza4 opened this issue · comments

Available Documentation

Hi there!

It looks like there is documentation out there explaining how to expose Kubernetes applications to the internet using cloudflared: https://developers.cloudflare.com/cloudflare-one/tutorials/many-cfd-one-tunnel/

There is also documentation that tells you how to use cloudflared to make a private network available to users via WARP: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/private-net/cloudflared/

What I am trying to do is:

  1. Deploy cloudflared on a AWS EKS Kubernetes cluster
  2. Make {"warp-routing": "enabled": true}
  3. Make the AWS VPC where the cluster exists accesible via WARP

I couldn't find any relevant documentation for this specific use case.

Suggested Documentation
Document how to make a private network accesible via WARP using a Docker container hosted on AWS EKS

Additional context
These are the scenarios I've tried so far:

  1. Deploying a Ubuntu EC2 machine, installing cloudflared and connecting to a dashboard-managed tunnel (no docker) ✅
  2. Deploying a AWS ECS cluster that uses EC2 instances (no fargate) with containers in bridge mode, running the cloudflared docker container as a task and connecting to a dashboard-managed tunnel ✅
  3. Deploying a AWS EKS cluster that uses EC2 instances, applying a deployment that uses the cloudflared docker container and connecting to a dashboard-managed tunnel ❌

The last one does not work, I keep getting DBG Session terminated error="session closed by remote due to terminated by edge" messages and all connections to the VPC via WARP are immediately closed.

Relevant logs

2023-11-15T03:53:19Z INF Starting tunnel tunnelID=63293943-4050-449c-ad55-20d636fcd767
2023-11-15T03:53:19Z INF Version 2023.10.0
2023-11-15T03:53:19Z INF GOOS: linux, GOVersion: go1.20.6, GoArch: arm64
2023-11-15T03:53:19Z INF Settings: map[loglevel:debug metrics:0.0.0.0:2000 no-autoupdate:true token:*****]
2023-11-15T03:53:19Z INF Generated Connector ID: e310a55c-4e1c-4ae2-aee7-bdf9b6f0e986
2023-11-15T03:53:19Z DBG Refreshed feature account_hash=11 pq_enabled=true pq_perct=101
2023-11-15T03:53:19Z DBG Fetched protocol: quic
2023-11-15T03:53:19Z INF Initial protocol quic
2023-11-15T03:53:19Z INF ICMP proxy will use 192.168.36.42 as source for IPv4
2023-11-15T03:53:19Z INF ICMP proxy will use fe80::d06e:a7ff:feeb:21ea in zone eth0 as source for IPv6
2023-11-15T03:53:19Z WRN The user running cloudflared process has a GID (group ID) that is not within ping_group_range. You might need to add that user to a group within that range, or instead update the range to encompass a group the user is already in by modifying /proc/sys/net/ipv4/ping_group_range. Otherwise cloudflared will not be able to ping this network error="Group ID 65532 is not between ping group 1 to 0"
2023-11-15T03:53:19Z DBG ICMP proxy feature is disabled error="cannot create ICMPv4 proxy: Group ID 65532 is not between ping group 1 to 0 nor ICMPv6 proxy: socket: permission denied"
2023-11-15T03:53:19Z WRN ICMP proxy feature is disabled error="cannot create ICMPv4 proxy: Group ID 65532 is not between ping group 1 to 0 nor ICMPv6 proxy: socket: permission denied"
2023-11-15T03:53:19Z DBG edge discovery: looking up edge SRV record domain=_v2-origintunneld._tcp.argotunnel.com event=0
2023-11-15T03:53:19Z DBG edge discovery: resolved edge addresses addresses=["198.41.192.107","198.41.192.227","198.41.192.37","198.41.192.167","198.41.192.7","198.41.192.67","198.41.192.77","198.41.192.57","198.41.192.47","198.41.192.27","2606:4700:a0::6","2606:4700:a0::7","2606:4700:a0::8","2606:4700:a0::3","2606:4700:a0::2","2606:4700:a0::10","2606:4700:a0::4","2606:4700:a0::9","2606:4700:a0::5","2606:4700:a0::1"] event=0
2023-11-15T03:53:19Z DBG edge discovery: resolved edge addresses addresses=["198.41.200.233","198.41.200.13","198.41.200.53","198.41.200.63","198.41.200.43","198.41.200.23","198.41.200.113","198.41.200.73","198.41.200.193","198.41.200.33","2606:4700:a8::3","2606:4700:a8::7","2606:4700:a8::1","2606:4700:a8::2","2606:4700:a8::10","2606:4700:a8::9","2606:4700:a8::5","2606:4700:a8::8","2606:4700:a8::4","2606:4700:a8::6"] event=0
2023-11-15T03:53:19Z DBG edge discovery: looking up edge SRV record domain=_v2-origintunneld._tcp.argotunnel.com event=0
2023-11-15T03:53:19Z INF Starting metrics server on [::]:2000/metrics
2023-11-15T03:53:19Z DBG edge discovery: resolved edge addresses addresses=["198.41.192.47","198.41.192.227","198.41.192.37","198.41.192.67","198.41.192.77","198.41.192.57","198.41.192.167","198.41.192.27","198.41.192.107","198.41.192.7","2606:4700:a0::10","2606:4700:a0::9","2606:4700:a0::6","2606:4700:a0::1","2606:4700:a0::2","2606:4700:a0::7","2606:4700:a0::5","2606:4700:a0::4","2606:4700:a0::8","2606:4700:a0::3"] event=0
2023-11-15T03:53:19Z DBG edge discovery: resolved edge addresses addresses=["198.41.200.43","198.41.200.193","198.41.200.33","198.41.200.113","198.41.200.233","198.41.200.73","198.41.200.53","198.41.200.23","198.41.200.13","198.41.200.63","2606:4700:a8::3","2606:4700:a8::5","2606:4700:a8::1","2606:4700:a8::2","2606:4700:a8::6","2606:4700:a8::7","2606:4700:a8::10","2606:4700:a8::9","2606:4700:a8::8","2606:4700:a8::4"] event=0
2023-11-15T03:53:19Z DBG edge discovery: giving new address to connection connIndex=0 event=0 ip=198.41.192.227
2023/11/15 03:53:19 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023-11-15T03:53:19Z DBG QUIC TLS event curve=X25519Kyber768Draft00 handshake=true handshake_duration="897.529µs"
2023-11-15T03:53:20Z INF Registered tunnel connection connIndex=0 connection=239c3569-55bc-40f3-b317-57ed331dc0f0 event=0 ip=198.41.192.227 location=iad02 protocol=quic
2023-11-15T03:53:20Z DBG edge discovery: giving new address to connection connIndex=1 event=0 ip=198.41.200.113
2023-11-15T03:53:20Z DBG QUIC TLS event curve=X25519Kyber768Draft00 handshake=true handshake_duration=1.009407ms
2023-11-15T03:53:20Z INF Registered tunnel connection connIndex=1 connection=683963ea-3b23-43e4-ae17-51a6ff4c416a event=0 ip=198.41.200.113 location=ord02 protocol=quic
2023-11-15T03:53:21Z DBG edge discovery: giving new address to connection connIndex=2 event=0 ip=198.41.192.167
2023-11-15T03:53:21Z DBG QUIC TLS event curve=X25519Kyber768Draft00 handshake=true handshake_duration="839.945µs"
2023-11-15T03:53:21Z INF Registered tunnel connection connIndex=2 connection=c4c5244b-4cc5-45f4-8032-da1885adb3fc event=0 ip=198.41.192.167 location=iad03 protocol=quic
2023-11-15T03:53:22Z INF Updated to new configuration config="{\"ingress\":[{\"service\":\"http_status:404\"}], \"originRequest\":{}, \"warp-routing\":{\"enabled\":true}}" version=2
2023-11-15T03:53:22Z DBG edge discovery: giving new address to connection connIndex=3 event=0 ip=198.41.200.193
2023-11-15T03:53:22Z DBG QUIC TLS event curve=X25519Kyber768Draft00 handshake=true handshake_duration="866.915µs"
2023-11-15T03:53:22Z INF Registered tunnel connection connIndex=3 connection=c4d5bc4b-d694-4246-b66b-9695c0aa198f event=0 ip=198.41.200.193 location=ord10 protocol=quic
2023-11-15T03:56:00Z DBG Registered session connIndex=0 dst=192.168.0.2:53 event=3 ip=198.41.192.227 sessionID=3b6e44ff-8269-4817-af5d-177c0bd2be23 src=192.168.36.42:43162
2023-11-15T03:56:04Z DBG Registered session connIndex=0 dst=192.168.0.2:53 event=3 ip=198.41.192.227 sessionID=526e78a9-22ce-4b8c-8ab9-495ece64d3e1 src=192.168.36.42:59131
2023-11-15T03:56:05Z DBG Destination connection closed connIndex=0 event=3 ip=198.41.192.227 sessionID=3b6e44ff-8269-4817-af5d-177c0bd2be23
2023-11-15T03:56:05Z DBG Session terminated error="session closed by remote due to terminated by edge" connIndex=0 event=3 ip=198.41.192.227 sessionID=3b6e44ff-8269-4817-af5d-177c0bd2be23
2023-11-15T03:56:10Z DBG Destination connection closed connIndex=0 event=3 ip=198.41.192.227 sessionID=526e78a9-22ce-4b8c-8ab9-495ece64d3e1
2023-11-15T03:56:10Z DBG Session terminated error="session closed by remote due to terminated by edge" connIndex=0 event=3 ip=198.41.192.227 sessionID=526e78a9-22ce-4b8c-8ab9-495ece64d3e1

I think this is a mix between missing documentation and some sort of bug?

The tunnel never goes down, it looks healthy on the dashboard all the time, but connections are still immediately terminated.

Hi,
I believe your tunnel is connecting to our servers, the issue can be in the connection between your pods within the EKS cluster. Can you share the kubernetes manifests that you are using?

Also, what ip addresses are you using? you should use the ips of the pods so you need all of them to have cluster ips. If you are trying to connect to resources outside of the EKS cluster, more AWS configurations needs to be done. Because by default pods in an EKS cluster can't access resources outside of the cluster.

Finally, if you are not using that yet, try to use our helm-chart for cloudflared. You can render it locally and then upload the manifests to the cluster if you don't want to use helm directly against the cluster.

Just do (commands for linux):

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Add Cloudflare Helm Repository
helm repo add cloudflare https://cloudflare.github.io/helm-charts
# Fetch Cloudflare Charts
helm repo update
# Render cloudflare chart and submit manifests to cluster
helm template cloudflared cloudflare/cloudflare-tunnel-remote --set-string cloudflare.tunnel_token=<your-secret-token> | kubectl apply -f -

Hi @jcsf,

Here is the deployment manifest: https://gitlab.com/fluidattacks/universe/-/blob/trunk/common/k8s/infra/ztna.tf

I made sure security policies allowed communication between pods and external nodes.

Pods also get private IPs for the VPC assigned as we use the AWS VPC CNI plugin

Here's an experiment I did recently that makes me think this is related to cloudflared rather than AWS configurations:

  1. I modified the manifest I just shared to run Ubuntu instead of cloudflared:
     image = "ubuntu:20.04"
     cmd = ["sh", "-c", "sleep 99999"]
    
    This provisioned a pod with the same security settings and other configurations that my cloudflared pod is currently using, all within the same Kubernetes cluster
  2. kubectl exec the pod to get an interactive shell
  3. Pinged an external server that's within the same VPC but NOT within the cluster and it responded correctly (This means communication between the pod and an external node works)
  4. Downloaded the cloudflared binary and set up the tunnel within the Ubuntu pod
  5. Tried to ping the same external server via WARP but the connection was terminated with the error log I showed earlier

I haven't tried the cloudflared helm chart, I'll give it a shot ASAP and let you know.

Thank you so much for your time!

Ok, in that case besides trying our chart, can you also try running cloudflared with http2 instead of quic. I see some logs in our servers saying that it couldn't establish a quic connection, so maybe we can try use a different protocol to see what happens. For that add the flag --protocol http2 to your cloudflared arguments. Also, please provide the timestamps of your experiments in UTC zone so that I can easily find the logs of your attempts.

Hi @jcsf,

Sorry for the late reply.

I ran the container with the following arguments:

args = [
  "tunnel",
  "--no-autoupdate",
  "--protocol",
  "http2",
  "--loglevel",
  "debug",
  "--metrics",
  "0.0.0.0:2000",
  "run",
  "--token",
  var.cloudflareTunnelToken,
]

When trying to reach a server within the VPC with WARP enabled, I got the following logs:

$ ping 192.168.10.37
PING 192.168.10.37 (192.168.10.37): 56 data bytes
92 bytes from 192.168.10.37: Destination Host Unreachable
92 bytes from 192.168.10.37: Destination Host Unreachable
92 bytes from 192.168.10.37: Destination Host Unreachable
92 bytes from 192.168.10.37: Destination Host Unreachable
92 bytes from 192.168.10.37: Destination Host Unreachable
92 bytes from 192.168.10.37: Destination Host Unreachable
92 bytes from 192.168.10.37: Destination Host Unreachable
$ traceroute 192.168.10.37
traceroute to 192.168.10.37 (192.168.10.37), 64 hops max
  1   172.70.53.14  62.899ms  59.515ms  59.189ms 
  2   *  *  * 
  3   *  *  *

When looking at the container logs, I could not find any relevant information:

2023-12-12T01:41:33Z INF Starting tunnel tunnelID=63293943-4050-449c-ad55-20d636fcd767
2023-12-12T01:41:33Z INF Version 2023.10.0
2023-12-12T01:41:33Z INF GOOS: linux, GOVersion: go1.20.6, GoArch: arm64
2023-12-12T01:41:33Z INF Settings: map[loglevel:debug metrics:0.0.0.0:2000 no-autoupdate:true p:http2 protocol:http2 token:*****]
2023-12-12T01:41:33Z INF Generated Connector ID: 250294ae-f3d0-42be-b7e8-2ec84be1d61e
2023-12-12T01:41:33Z DBG Refreshed feature account_hash=11 pq_enabled=true pq_perct=101
2023-12-12T01:41:33Z DBG Fetched protocol: quic
2023-12-12T01:41:33Z INF Initial protocol http2
2023-12-12T01:41:33Z INF ICMP proxy will use 192.168.24.206 as source for IPv4
2023-12-12T01:41:33Z INF ICMP proxy will use fe80::640a:30ff:fe10:64d6 in zone eth0 as source for IPv6
2023-12-12T01:41:33Z WRN The user running cloudflared process has a GID (group ID) that is not within ping_group_range. You might need to add that user to a group within that range, or instead update the range to encompass a group the user is already in by modifying /proc/sys/net/ipv4/ping_group_range. Otherwise cloudflared will not be able to ping this network error="Group ID 65532 is not between ping group 1 to 0"
2023-12-12T01:41:33Z DBG ICMP proxy feature is disabled error="cannot create ICMPv4 proxy: Group ID 65532 is not between ping group 1 to 0 nor ICMPv6 proxy: socket: permission denied"
2023-12-12T01:41:33Z WRN ICMP proxy feature is disabled error="cannot create ICMPv4 proxy: Group ID 65532 is not between ping group 1 to 0 nor ICMPv6 proxy: socket: permission denied"
2023-12-12T01:41:33Z DBG edge discovery: looking up edge SRV record domain=_v2-origintunneld._tcp.argotunnel.com event=0
2023-12-12T01:41:33Z DBG edge discovery: resolved edge addresses addresses=["198.41.192.7","198.41.192.77","198.41.192.67","198.41.192.227","198.41.192.47","198.41.192.57","198.41.192.37","198.41.192.27","198.41.192.107","198.41.192.167","2606:4700:a0::3","2606:4700:a0::8","2606:4700:a0::2","2606:4700:a0::7","2606:4700:a0::1","2606:4700:a0::5","2606:4700:a0::10","2606:4700:a0::4","2606:4700:a0::9","2606:4700:a0::6"] event=0
2023-12-12T01:41:33Z DBG edge discovery: resolved edge addresses addresses=["198.41.200.73","198.41.200.53","198.41.200.23","198.41.200.233","198.41.200.13","198.41.200.63","198.41.200.33","198.41.200.193","198.41.200.113","198.41.200.43","2606:4700:a8::1","2606:4700:a8::4","2606:4700:a8::2","2606:4700:a8::10","2606:4700:a8::6","2606:4700:a8::9","2606:4700:a8::5","2606:4700:a8::8","2606:4700:a8::3","2606:4700:a8::7"] event=0
2023-12-12T01:41:33Z DBG edge discovery: looking up edge SRV record domain=_v2-origintunneld._tcp.argotunnel.com event=0
2023-12-12T01:41:33Z INF Starting metrics server on [::]:2000/metrics
2023-12-12T01:41:33Z DBG edge discovery: resolved edge addresses addresses=["198.41.192.67","198.41.192.37","198.41.192.57","198.41.192.167","198.41.192.77","198.41.192.227","198.41.192.27","198.41.192.47","198.41.192.107","198.41.192.7","2606:4700:a0::9","2606:4700:a0::6","2606:4700:a0::1","2606:4700:a0::3","2606:4700:a0::2","2606:4700:a0::8","2606:4700:a0::5","2606:4700:a0::7","2606:4700:a0::10","2606:4700:a0::4"] event=0
2023-12-12T01:41:33Z DBG edge discovery: resolved edge addresses addresses=["198.41.200.193","198.41.200.33","198.41.200.13","198.41.200.63","198.41.200.113","198.41.200.73","198.41.200.43","198.41.200.53","198.41.200.233","198.41.200.23","2606:4700:a8::10","2606:4700:a8::2","2606:4700:a8::9","2606:4700:a8::8","2606:4700:a8::7","2606:4700:a8::3","2606:4700:a8::5","2606:4700:a8::1","2606:4700:a8::4","2606:4700:a8::6"] event=0
2023-12-12T01:41:33Z DBG edge discovery: giving new address to connection connIndex=0 event=0 ip=198.41.192.107
2023-12-12T01:41:33Z DBG Connecting via http2 connIndex=0 event=0 ip=198.41.192.107
2023-12-12T01:41:33Z INF Registered tunnel connection connIndex=0 connection=b3cfbc26-11f5-4a59-8959-5332081c9cb2 event=0 ip=198.41.192.107 location=iad09 protocol=http2
2023-12-12T01:41:33Z DBG edge discovery: giving new address to connection connIndex=1 event=0 ip=198.41.200.43
2023-12-12T01:41:33Z DBG Connecting via http2 connIndex=1 event=0 ip=198.41.200.43
2023-12-12T01:41:34Z INF Registered tunnel connection connIndex=1 connection=0c4a05e0-43c6-4442-a28f-d6c99542d91a event=0 ip=198.41.200.43 location=iad08 protocol=http2
2023-12-12T01:41:34Z DBG edge discovery: giving new address to connection connIndex=2 event=0 ip=198.41.192.37
2023-12-12T01:41:34Z DBG Connecting via http2 connIndex=2 event=0 ip=198.41.192.37
2023-12-12T01:41:35Z INF Registered tunnel connection connIndex=2 connection=29ecedaa-aba3-42e6-a265-30941a0ed75c event=0 ip=198.41.192.37 location=iad02 protocol=http2
2023-12-12T01:41:35Z INF Updated to new configuration config="{\"ingress\":[{\"service\":\"http_status:404\"}], \"originRequest\":{}, \"warp-routing\":{\"enabled\":true}}" version=2
2023-12-12T01:41:35Z DBG edge discovery: giving new address to connection connIndex=3 event=0 ip=198.41.200.193
2023-12-12T01:41:35Z DBG Connecting via http2 connIndex=3 event=0 ip=198.41.200.193
2023-12-12T01:41:35Z INF Registered tunnel connection connIndex=3 connection=2a56ed68-4f96-4ed2-bbd9-7c9090d64eb3 event=0 ip=198.41.200.193 location=iad12 protocol=http2

An interesting fact is that I do not get Destination Host Unreachable errors when pinging the server when using QUIC 🤔

All tests were performed between 1:00AM and 2:00AM, December 12th, UTC.

@jcsf After some additional testing I found out that the error was caused by this:

WRN The user running cloudflared process has a GID (group ID) that is not within ping_group_range. You might need to add that user to a group within that range, or instead update the range to encompass a group the user is already in by modifying /proc/sys/net/ipv4/ping_group_range. Otherwise cloudflared will not be able to ping this network error="Group ID 65532 is not between ping group 1 to 0"

What I did to test connectivity was:

  1. Turning on an additional server within the VPC
  2. Exposing a HTTP server
  3. Trying to access it via WARP

It worked as expected.

I will continue testing things out and see if I can find any misbehaviors as I remember trying the exact same thing in the past with no success 🤔

Closing this issue for now.

Thank you so much!

Ok nice to ear. Thank you!