kube-image-keeper-registry-0 goes into a crash back-off when using AWS EBS persistent storage.

Question

kube-image-keeper-registry-0 goes into a crash back-off when using AWS EBS persistent storage.

mccullough-ea opened this issue a year ago · comments

Sometimes when the kube-image-keeper-registry-0 pod is restarted it goes into a crash back-off with something like this at the end of its logs

garbage-collector docker.elastic.co/beats/filebeat: marking blob sha256:89732bc7504122601f40269fc9ddfb70982e633ea9caf641ae45736f2846b004                                          │
garbage-collector docker.io/jgraph/drawio                                                                                                                                         │
garbage-collector manifest eligible for deletion: sha256:fb2a84c7a2e04d4ea2e5aa0c57385e0e61dd3c7c5ea559a09d5a3a2cca6de28f

I haven't found any errors in the logs, but it always ends with "garbage-collector manifest eligible for deletion"

My workaround is to delete the PVC and then restart the pod again. So there must be something on the volume that breaks it.

We are deploying it using the kube-image-keeper helm chart from https://charts.enix.io/ v1.4.0

It is hosted in EKS with k8s version 1.24. And uses the following stroage class:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  name: encrypted-ebs
parameters:
  csi.storage.k8s.io/fstype: ext4
  encrypted: "true"
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Any help would be greatly appreciated

Nicolasgouze · Answer 1 · Fri Nov 24 2023 00:08:12 GMT+0800 (China Standard Time)

Hello @mccullough-ea , can you please test again with our last beta release (1.5.0-beta.1, published yesterday) OR update the deployment and set registry.persistence.deleteUntagged helm value to false ?

mccullough-ea · Answer 2 · Fri Nov 24 2023 00:32:49 GMT+0800 (China Standard Time)

Thanks for the suggestion, 1.50-beta.1 seems have fixed it! I'll keep testing just in case I've got lucky..

mccullough-ea · Answer 3 · Fri Nov 24 2023 18:57:46 GMT+0800 (China Standard Time)

@Nicolasgouze Thanks again for the suggestion, I can't seem to break it any more no matter how hard I try! Closing issue..