enix / kube-image-keeper

kuik is a container image caching system for Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to configure the proxy bind address ?

jperville opened this issue · comments

I am evaluating kuik v1.4.0 on different kubernetes cluster.

It works very well on minikube (minikube v1.31.1, k8s v1.26.6, containerd runtime)
but I am having hostPort/hostIP issues on a kubespray-deployed VM (kubespray v2.22.1, k8s v1.26.5, crio runtime) or on minikube with crio runtime.

In both cases, I deploy kuik on the cluster using the helm chart from this project.

Symptoms

Here are the symptoms on kubespray cluster with crio runtime:

  • all kuik pods are up and pass their readiness probe
  • pods which use cached images have status ErrImagePull or ImagePullBackOff
  • if I kubectl describe one of the pod, I can see the following message: pinging container registry localhost:7439: Get "http://localhost:7439/v2/": dial tcp 127.0.0.1:7439: connect: no route to host

I believe that the reason is an unfixed crio issue : cri-o/cri-o#1804 but this is too complex for me to fix the issue myself. To sum the problem, the proxy-daemonset is configured with hostPort: 7439 and hostIP: 127.0.0.1 but port-forwarding from a pod to the host is currently broken with crio.

As a workaround, I would like to be able to run the proxy daemonset to listen to 127.0.0.1:7439 using the hostNetwork.
Today, the proxy daemonset listens to port 8082 on all interfaces (and it is hardcoded : https://github.com/enix/kube-image-keeper/blob/v1.4.0/internal/proxy/server.go#L108-L110 ).

Would you accept a pull request to make the proxy bind address configurable (with defaults compatible with the existing behavior) ? That would happily workaround my issue and the helm chart could be updated to listen on the hostNetwork as an alternative to the current version that uses hostIP/hostPort.

Some troubleshooting

My minikube start command-line:

minikube start \
  --driver=virtualbox \
  --host-only-cidr=192.168.99.1/24 \
  --memory=10240 \
  --cpus=8 \
  --kubernetes-version=1.26.6 \
  --service-cluster-ip-range=10.96.0.0/12 \
  --docker-opt bip=172.17.0.1/20 \
  --extra-config=kubelet.authentication-token-webhook=true \
  --extra-config=kubelet.authorization-mode=Webhook \
  --extra-config=kubelet.max-pods=110 \
  --extra-config=apiserver.enable-admission-plugins=AlwaysPullImages,PodNodeSelector \
  --extra-config=scheduler.bind-address=0.0.0.0 \
  --extra-config=controller-manager.bind-address=0.0.0.0 \
  --addons ingress \
  --addons storage-provisioner \
  --container-runtime=cri-o

Note the --container-runtime=cri-o option (if not specified, the runtime will be containerd, which works).

Then I apply kuik on the cluster using helm as usual.

Pod status on the cluster

# kube-image-keeper pods are up
$ kubectl get pod -n kube-image-keeper
NAME                                             READY   STATUS    RESTARTS      AGE
kube-image-keeper-controllers-5f69d66fdc-tbgfg   1/1     Running   1 (14h ago)   14h
kube-image-keeper-controllers-5f69d66fdc-xx4fc   1/1     Running   0             14h
kube-image-keeper-proxy-rg876                    1/1     Running   0             14h
kube-image-keeper-registry-0                     1/1     Running   0             14h

# create a test deployment which use a cached docker image
$ kubectl create deployment mydeploy --image docker.io/busybox -- nc -lp 1337

# after waiting a bit, the mydeploy pod cannot pull its image
$ kubectl get pod -l app=mydeploy
NAME                       READY   STATUS             RESTARTS   AGE
mydeploy-8b8f68f58-q72pd   0/1     ImagePullBackOff   0          7m40s

$ kubectl describe pod -l app=mydeploy | tail -n3
  Warning  Failed     5m48s (x4 over 7m57s)  kubelet            Error: ErrImagePull
  Warning  Failed     5m37s (x6 over 7m57s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    3m2s (x16 over 7m57s)  kubelet            Back-off pulling image "localhost:7439/docker.io/busybox"

If I directly try to pull the image inside the VM:

# crictl pull localhost:7439/docker.io/busybox
E1213 11:22:38.682320   22294 remote_image.go:242] "PullImage from image service failed" err="rpc error: code = Unknown desc = pinging container registry localhost:7439: Get \"http://localhost:7439/v2/\": dial tcp 127.0.0.1:7439: connect: no route to host" image="localhost:8082/docker.io/busybox"
FATA[0012] pulling image: rpc error: code = Unknown desc = pinging container registry localhost:7439: Get "http://localhost:7439/v2/": dial tcp 127.0.0.1:7439: connect: no route to host 

# curl -sSL -x '' --fail localhost:7439
curl: (7) Failed connect to localhost:7439; No route to host

# using the proxy pod IP address works
# curl -sSL -x '' --fail 10.233.105.77:7439
curl: (22) The requested URL returned error: 404 Not Found

Hi!
Same issue for me on a fresh Kubeadm cluster (v1.29.0) using cri-o too (v1.29.0) and Calico CNI (v3.26.4).

Hello,

A pull request would be very much appreciated! However, beware that the metrics port will also be exposed on the host network and thus should be configurable too to avoid port collision with other services. Also I don't know the implication of this change in the matter of using PodMonitor to scrape metrics. If you can address those two points I will be happy to review and merge your PR.

Hello @paullaffitte actually the metrics port is already configurable (using the --metrics-bind-address option).

In PR #235 I tried to make the behavior consistent (be able to configure the metrics bind address AND the proxy bind address, not only the former).

Resolved by #235, closing

PS : I just added one commit after merging your PR to address a small issue about configuring the readiness probe. It wasn't really a bug but it required extra configuration that could be avoided. I found it easier to fix it myself than explaining it. Your work is still very much appreciated, thanks :)

Thanks a lot @paullaffitte . Looking forward to seeing all these in the upcoming release.