Loadbalancer healthchecks report error despite node ports connectable

Question

Loadbalancer healthchecks report error despite node ports connectable

process0 opened this issue 3 years ago · comments

Creating this issue here so others don't waste time. Maybe this should be in the documentation.

Setup:

K8S
Calico
hcloud CCM
hcloud CSI
Istio

Annotated the istio-ingressgateway with all information needed to use a pre-provisioned (terraformed) load balancer. Each http, https, and tcp node port on the worker nodes were connectable, yet the HCloud load balancer heath checks kept saying they were unreachable / not healthy.

Checking tcpdump on the interface and filtering the node port, its clear the healthcheck packets don't get ack'd:

root@dev1-worker-2:~# tcpdump -i ens10 port 31945
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens10, link-type EN10MB (Ethernet), capture size 262144 bytes
22:49:07.439005 IP 10.9.8.5.46310 > dev1-worker-2.cluster.local.31945: Flags [S], seq 2304935557, win 64860, options [mss 1410,sackOK,TS val 1914248445 ecr 0,nop,wscale 7], length 0
22:49:08.461531 IP 10.9.8.5.46310 > dev1-worker-2.cluster.local.31945: Flags [S], seq 2304935557, win 64860, options [mss 1410,sackOK,TS val 1914249468 ecr 0,nop,wscale 7], length 0
22:49:10.477634 IP 10.9.8.5.46310 > dev1-worker-2.cluster.local.31945: Flags [S], seq 2304935557, win 64860, options [mss 1410,sackOK,TS val 1914251484 ecr 0,nop,wscale 7], length 0

6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 0e:83:d9:ad:a1:e3 brd ff:ff:ff:ff:ff:ff
    ...
    inet 10.9.8.5/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever

It became clear why. A route was created on each of the worker nodes containing the HCloud load balancer IP, so the HCloud load balancer never received the response.

I came across this useful comment #58 (comment)

I think I'm hitting a similar issue and did take a deeper look into it. Actually, the issue seems to be "quite known" in the Kubernetes community (metallb/metallb#153, kubernetes/kubernetes#79976, kubernetes/kubernetes#66607, kubernetes/kubernetes#92312, kubernetes/enhancements#1392, kubernetes/kubernetes#79783, kubernetes/kubernetes#59976).

TLDR: I think the problem is the following:

Using hcloud-cloud-controller-manager, LoadBalancer services get to know their external IPs. This IP gets added to the ipvs0 interface to allow cluster-internal access to the LoadBalancer. Also, a route will be created pointing to this IP on all nodes (ip route show table local). If now the (Hetzner) Load Balancer tries to send a health check packet, the cluster's reply will stay within the cluster, since the route is pointing to the ipvs0 interface instead of the internal network's network card.

There are a lot solutions in discussion, but as far as I know, nothing helpful so far. The only workaround seems to be to use iptables instead of ipvs as kube_proxy mode (didn't try yet with Hetzner Load Balancer). However, this will come with a drawback regarding performance (https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/).
As a very dirty hack and experiment, I temporarily removed the local route (ip route del local $internal_loadbalancer_ip dev kube-ipvs0 table local) and health checks started to lighten in green immediatly. However, this ugly workaround will not survive a reboot.

Currently reading about stuff that it might be possbile to replace kube_proxy/ipvs with cilium, but just started with trying to understand things there...For now, I guess, only iptables will "work". But I'm happy to discuss and work with you and Hetzner staff to find a solution.

And a fix: #58 (comment), which is to annotate the Service with load-balancer.hetzner.cloud/hostname: your-ingress.acme.corp

github-actions · Answer 1 · Sun Aug 22 2021 20:57:10 GMT+0800 (China Standard Time)

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.