How to configure the proxy bind address ?

Question

How to configure the proxy bind address ?

jperville opened this issue 10 months ago · comments

I am evaluating kuik v1.4.0 on different kubernetes cluster.

It works very well on minikube (minikube v1.31.1, k8s v1.26.6, containerd runtime)
but I am having hostPort/hostIP issues on a kubespray-deployed VM (kubespray v2.22.1, k8s v1.26.5, crio runtime) or on minikube with crio runtime.

In both cases, I deploy kuik on the cluster using the helm chart from this project.

Symptoms

Here are the symptoms on kubespray cluster with crio runtime:

all kuik pods are up and pass their readiness probe
pods which use cached images have status ErrImagePull or ImagePullBackOff
if I kubectl describe one of the pod, I can see the following message: pinging container registry localhost:7439: Get "http://localhost:7439/v2/": dial tcp 127.0.0.1:7439: connect: no route to host

I believe that the reason is an unfixed crio issue : cri-o/cri-o#1804 but this is too complex for me to fix the issue myself. To sum the problem, the proxy-daemonset is configured with hostPort: 7439 and hostIP: 127.0.0.1 but port-forwarding from a pod to the host is currently broken with crio.

As a workaround, I would like to be able to run the proxy daemonset to listen to 127.0.0.1:7439 using the hostNetwork.
Today, the proxy daemonset listens to port 8082 on all interfaces (and it is hardcoded : https://github.com/enix/kube-image-keeper/blob/v1.4.0/internal/proxy/server.go#L108-L110 ).

Would you accept a pull request to make the proxy bind address configurable (with defaults compatible with the existing behavior) ? That would happily workaround my issue and the helm chart could be updated to listen on the hostNetwork as an alternative to the current version that uses hostIP/hostPort.

Some troubleshooting

My minikube start command-line:

minikube start \
  --driver=virtualbox \
  --host-only-cidr=192.168.99.1/24 \
  --memory=10240 \
  --cpus=8 \
  --kubernetes-version=1.26.6 \
  --service-cluster-ip-range=10.96.0.0/12 \
  --docker-opt bip=172.17.0.1/20 \
  --extra-config=kubelet.authentication-token-webhook=true \
  --extra-config=kubelet.authorization-mode=Webhook \
  --extra-config=kubelet.max-pods=110 \
  --extra-config=apiserver.enable-admission-plugins=AlwaysPullImages,PodNodeSelector \
  --extra-config=scheduler.bind-address=0.0.0.0 \
  --extra-config=controller-manager.bind-address=0.0.0.0 \
  --addons ingress \
  --addons storage-provisioner \
  --container-runtime=cri-o

Note the --container-runtime=cri-o option (if not specified, the runtime will be containerd, which works).

Then I apply kuik on the cluster using helm as usual.

Pod status on the cluster

# kube-image-keeper pods are up
$ kubectl get pod -n kube-image-keeper
NAME                                             READY   STATUS    RESTARTS      AGE
kube-image-keeper-controllers-5f69d66fdc-tbgfg   1/1     Running   1 (14h ago)   14h
kube-image-keeper-controllers-5f69d66fdc-xx4fc   1/1     Running   0             14h
kube-image-keeper-proxy-rg876                    1/1     Running   0             14h
kube-image-keeper-registry-0                     1/1     Running   0             14h

# create a test deployment which use a cached docker image
$ kubectl create deployment mydeploy --image docker.io/busybox -- nc -lp 1337

# after waiting a bit, the mydeploy pod cannot pull its image
$ kubectl get pod -l app=mydeploy
NAME                       READY   STATUS             RESTARTS   AGE
mydeploy-8b8f68f58-q72pd   0/1     ImagePullBackOff   0          7m40s

$ kubectl describe pod -l app=mydeploy | tail -n3
  Warning  Failed     5m48s (x4 over 7m57s)  kubelet            Error: ErrImagePull
  Warning  Failed     5m37s (x6 over 7m57s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    3m2s (x16 over 7m57s)  kubelet            Back-off pulling image "localhost:7439/docker.io/busybox"

If I directly try to pull the image inside the VM:

# crictl pull localhost:7439/docker.io/busybox
E1213 11:22:38.682320   22294 remote_image.go:242] "PullImage from image service failed" err="rpc error: code = Unknown desc = pinging container registry localhost:7439: Get \"http://localhost:7439/v2/\": dial tcp 127.0.0.1:7439: connect: no route to host" image="localhost:8082/docker.io/busybox"
FATA[0012] pulling image: rpc error: code = Unknown desc = pinging container registry localhost:7439: Get "http://localhost:7439/v2/": dial tcp 127.0.0.1:7439: connect: no route to host 

# curl -sSL -x '' --fail localhost:7439
curl: (7) Failed connect to localhost:7439; No route to host

# using the proxy pod IP address works
# curl -sSL -x '' --fail 10.233.105.77:7439
curl: (22) The requested URL returned error: 404 Not Found

Cédric Montagne · Answer 1 · Mon Dec 18 2023 03:01:35 GMT+0800 (China Standard Time)

Hi!
Same issue for me on a fresh Kubeadm cluster (v1.29.0) using cri-o too (v1.29.0) and Calico CNI (v3.26.4).

Paul Laffitte · Answer 2 · Tue Dec 19 2023 20:31:36 GMT+0800 (China Standard Time)

Hello,

A pull request would be very much appreciated! However, beware that the metrics port will also be exposed on the host network and thus should be configurable too to avoid port collision with other services. Also I don't know the implication of this change in the matter of using PodMonitor to scrape metrics. If you can address those two points I will be happy to review and merge your PR.

Julien Pervillé · Answer 3 · Tue Dec 19 2023 21:06:48 GMT+0800 (China Standard Time)

Hello @paullaffitte actually the metrics port is already configurable (using the --metrics-bind-address option).

In PR #235 I tried to make the behavior consistent (be able to configure the metrics bind address AND the proxy bind address, not only the former).

Paul Laffitte · Answer 4 · Thu Dec 28 2023 00:15:44 GMT+0800 (China Standard Time)

Resolved by #235, closing

PS : I just added one commit after merging your PR to address a small issue about configuring the readiness probe. It wasn't really a bug but it required extra configuration that could be avoided. I found it easier to fix it myself than explaining it. Your work is still very much appreciated, thanks :)

Julien Pervillé · Answer 5 · Mon Jan 08 2024 23:11:28 GMT+0800 (China Standard Time)

Thanks a lot @paullaffitte . Looking forward to seeing all these in the upcoming release.