Config Reloader v0.6.x Crash Loop on "Too Many Open Files"

Question

Config Reloader v0.6.x Crash Loop on "Too Many Open Files"

Kampe opened this issue 2 years ago · comments

Posting this here as the code seems to live here in this repo:
nats-io/k8s#488

Hello!

We're seeing strange issues with our config-reloader using the latest build from February (v0.6.3) - config reloader pods will not boot with logs stopping at:

2022/04/18 19:37:26 Starting NATS Server Reloader v0.6.3
Error: too many open files
Is anyone else seeing this? I'm able to show this reproducing in multiple clusters (k3s) unfortunately, while sometimes the container launches just fine.

Have attempted to utilize older versions as well (0.6.1 and 0.6.2) with the same issue happening unfortunately.

Nick Kampe · Answer 1 · Wed Apr 20 2022 04:10:41 GMT+0800 (China Standard Time)

describe output:

statefulset

Name:               nats-edge
Namespace:          default
CreationTimestamp:  Mon, 18 Apr 2022 14:58:09 -0500
Selector:           app=nats-edge
Labels:             app=nats-edge
                    app.kubernetes.io/instance=messaging
Annotations:        <none>
Replicas:           3 desired | 3 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=nats-edge
  Annotations:      kubectl.kubernetes.io/restartedAt: 2022-04-19T15:06:18-05:00
  Service Account:  nats-server
  Containers:
   nats:
    Image:       nats:2.7.4
    Ports:       4222/TCP, 7422/TCP, 6222/TCP, 8222/TCP, 7777/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /nats-server
      --config
      /etc/nats-config/nats.conf
    Requests:
      cpu:      0
    Liveness:   http-get http://:8222/ delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8222/ delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:            (v1:metadata.name)
      POD_NAMESPACE:       (v1:metadata.namespace)
      CLUSTER_ADVERTISE:  $(POD_NAME).nats-edge.$(POD_NAMESPACE).svc
      LEAFNODE_TOKEN:     <set to the key 'token' in secret 'nats-creds'>  Optional: false
    Mounts:
      /data/jetstream from nats-js-sts-vol (rw)
      /etc/nats-config from config-volume (rw)
      /var/run/nats from pid (rw)
   reloader:
    Image:      natsio/nats-server-config-reloader:0.6.3
    Port:       <none>
    Host Port:  <none>
    Command:
      nats-server-config-reloader
      -pid
      /var/run/nats/nats.pid
      -config
      /etc/nats-config/nats.conf
    Environment:  <none>
    Mounts:
      /etc/nats-config from config-volume (rw)
      /var/run/nats from pid (rw)
   metrics:
    Image:      natsio/prometheus-nats-exporter:0.9.1
    Port:       7777/TCP
    Host Port:  0/TCP
    Args:
      -connz
      -routez
      -serverz
      -subz
      -gatewayz
      -varz
      -channelz
      -prefix=nats
      -use_internal_server_id
      -DV
      http://localhost:8222
    Environment:  <none>
    Mounts:       <none>
  Volumes:
   config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nats-edge-config
    Optional:  false
   pid:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
Volume Claims:
  Name:          nats-js-sts-vol
  StorageClass:  
  Labels:        <none>
  Annotations:   <none>
  Capacity:      2Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age                  From                    Message
  ----    ------            ----                 ----                    -------
  Normal  SuccessfulDelete  2m43s                statefulset-controller  delete Pod nats-edge-2 in StatefulSet nats-edge successful
  Normal  SuccessfulCreate  2m41s (x2 over 24h)  statefulset-controller  create Pod nats-edge-2 in StatefulSet nats-edge successful
  Normal  SuccessfulDelete  2m21s                statefulset-controller  delete Pod nats-edge-1 in StatefulSet nats-edge successful
  Normal  SuccessfulCreate  2m11s (x2 over 24h)  statefulset-controller  create Pod nats-edge-1 in StatefulSet nats-edge successful

pod:

Name:         nats-edge-1
Namespace:    default
Priority:     0
Node:         k3s-leader/192.168.1.139
Start Time:   Tue, 19 Apr 2022 15:06:50 -0500
Labels:       app=nats-edge
              controller-revision-hash=nats-edge-5bb9f677cd
              statefulset.kubernetes.io/pod-name=nats-edge-1
Annotations:  kubectl.kubernetes.io/restartedAt: 2022-04-19T15:06:18-05:00
Status:       Running
IP:           10.42.0.170
IPs:
  IP:           10.42.0.170
Controlled By:  StatefulSet/nats-edge
Containers:
  nats:
    Container ID:  containerd://be6c774e089d619efffd07358ffc31e00895731e18d83ff41339be12704e24f1
    Image:         nats:2.7.4
    Image ID:      docker.io/library/nats@sha256:6f86d5bac0f78b01cf5f9f707008403b0792ff5029817f8afc23c895c5215cc6
    Ports:         4222/TCP, 7422/TCP, 6222/TCP, 8222/TCP, 7777/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /nats-server
      --config
      /etc/nats-config/nats.conf
    State:          Running
      Started:      Tue, 19 Apr 2022 15:06:51 -0500
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      0
    Liveness:   http-get http://:8222/ delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8222/ delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:           nats-edge-1 (v1:metadata.name)
      POD_NAMESPACE:      default (v1:metadata.namespace)
      CLUSTER_ADVERTISE:  $(POD_NAME).nats-edge.$(POD_NAMESPACE).svc
      LEAFNODE_TOKEN:     <set to the key 'token' in secret 'nats-creds'>  Optional: false
    Mounts:
      /data/jetstream from nats-js-sts-vol (rw)
      /etc/nats-config from config-volume (rw)
      /var/run/nats from pid (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8ccx9 (ro)
  reloader:
    Container ID:  containerd://6a4cfc309b6425d2d21dc26411caf509cc77b8191d7d3d258eb6114efacccfc9
    Image:         natsio/nats-server-config-reloader:0.6.3
    Image ID:      docker.io/natsio/nats-server-config-reloader@sha256:b310e8766c507a20dae7c634b81b49b47682e2ec985c0cd8b3c746660febb9fe
    Port:          <none>
    Host Port:     <none>
    Command:
      nats-server-config-reloader
      -pid
      /var/run/nats/nats.pid
      -config
      /etc/nats-config/nats.conf
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 19 Apr 2022 15:12:37 -0500
      Finished:     Tue, 19 Apr 2022 15:12:37 -0500
    Ready:          False
    Restart Count:  6
    Environment:    <none>
    Mounts:
      /etc/nats-config from config-volume (rw)
      /var/run/nats from pid (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8ccx9 (ro)
  metrics:
    Container ID:  containerd://b31489fbcbe29bb6daa6ac9b42c2bb7ad3961de4a54de7af9c59cc40cd57f8e5
    Image:         natsio/prometheus-nats-exporter:0.9.1
    Image ID:      docker.io/natsio/prometheus-nats-exporter@sha256:4665cdc7e749548ee9999764f29665aaeeecac95f310319016764ba2ab25e1c1
    Port:          7777/TCP
    Host Port:     0/TCP
    Args:
      -connz
      -routez
      -serverz
      -subz
      -gatewayz
      -varz
      -channelz
      -prefix=nats
      -use_internal_server_id
      -DV
      http://localhost:8222
    State:          Running
      Started:      Tue, 19 Apr 2022 15:06:51 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8ccx9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  nats-js-sts-vol:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nats-js-sts-vol-nats-edge-1
    ReadOnly:   false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nats-edge-config
    Optional:  false
  pid:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-8ccx9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  7m30s                   default-scheduler  Successfully assigned default/nats-edge-1 to k3s-leader
  Normal   Pulled     7m29s                   kubelet            Container image "nats:2.7.4" already present on machine
  Normal   Created    7m29s                   kubelet            Created container nats
  Normal   Started    7m29s                   kubelet            Started container nats
  Normal   Pulled     7m29s                   kubelet            Container image "natsio/prometheus-nats-exporter:0.9.1" already present on machine
  Normal   Created    7m29s                   kubelet            Created container metrics
  Normal   Started    7m29s                   kubelet            Started container metrics
  Normal   Pulled     6m43s (x4 over 7m29s)   kubelet            Container image "natsio/nats-server-config-reloader:0.6.3" already present on machine
  Normal   Created    6m43s (x4 over 7m29s)   kubelet            Created container reloader
  Normal   Started    6m43s (x4 over 7m29s)   kubelet            Started container reloader
  Warning  BackOff    2m24s (x26 over 7m28s)  kubelet            Back-off restarting failed container

Waldemar Quevedo · Answer 2 · Wed Apr 20 2022 04:17:30 GMT+0800 (China Standard Time)

Thanks @Kampe for the report, looking into this...

Nick Kampe · Answer 3 · Wed Apr 20 2022 04:22:32 GMT+0800 (China Standard Time)

Excellent, thank you!

More interestingly, sometimes you can get one or two pods of a replica-set to stand up

nats-edge-0                                        3/3     Running                      0          24h
nats-edge-1                                        2/3     CrashLoopBackOff             7          15m
nats-edge-2                                        3/3     Running                      0          15m

Waldemar Quevedo · Answer 4 · Wed Apr 20 2022 04:59:24 GMT+0800 (China Standard Time)

Hi @Kampe would it be possible to try setting the following in your Helm chart to check if this fixes it for you?

reloader:
  enabled: true
  image: wallyqs/nats-server-config-reloader:fclose

Nick Kampe · Answer 5 · Wed Apr 20 2022 05:44:54 GMT+0800 (China Standard Time)

We've had great preliminary success with this image in a few environments!

Robert Moucha · Answer 6 · Sun May 08 2022 22:25:36 GMT+0800 (China Standard Time)

I'm facing the very same issue in KinD cluster running on Linux. I already tried both your ad-hoc image wallyqs/nats-server-config-reloader:fclose and also official build 0.7.0.
I managed to customize container's command to sleep 3600, connect into it and run the original command nats-server-config-reloader -pid /var/run/nats/nats.pid -config /etc/nats-config/nats.conf in strace.

The result is that the Too many files error comes from inotify_init1(IN_CLOEXEC) = -1 EMFILE (No file descriptors available) syscall.

So (at least in my case) the root cause is not by missing f.Close but much sooner - in:

configWatcher, err := fsnotify.NewWatcher()

that actually calls unix.InotifyInit1(unix.IN_CLOEXEC)

EDIT: I checked my sysctl settings. I have fs.inotify.max_user_watches = 204800. I increased it to 1M, but it didn't help. What really helped me was to set fs.inotify.max_user_instances=256 (was 128). Now the reloader starts without further issues.

Hope it helps.

Andrei Nesterov · Answer 7 · Sun Jul 16 2023 15:49:29 GMT+0800 (China Standard Time)

The issue still appeears on 0.11.0 and current master branch (d7f2a3b commit).

Erik Logtenberg · Answer 8 · Fri Aug 11 2023 08:04:56 GMT+0800 (China Standard Time)

the NATS install that comes bundled in with TestKube also suffers from the same issue:

pod/testkube-test-nats-0 2/3 CrashLoopBackOff 25 (3m8s ago) 54m

$ kubectl logs -n testkube-test pod/testkube-test-nats-0 reloader
2023/08/10 23:57:24 Starting NATS Server Reloader v0.7.4
Error: too many open files

This is installed by helm chart version: testkube-1.13.33

I tried increasing ulimit -n (max open file descriptors) from 1024 to 4096 but it made no difference.

Anything I can do to further narrow it down?

Dylan · Answer 9 · Mon Jan 22 2024 22:56:22 GMT+0800 (China Standard Time)

Same issue here. It seems random and I can get 1-3 pods running, but it was hard to do and I'm not sure why. The host has been configured with a large open file allowance.

arahja · Answer 10 · Tue Feb 27 2024 05:57:27 GMT+0800 (China Standard Time)

I ran into this issue as well running NATS 2.10.11 (NATS Server Reloader v0.14.1) on a k3s v1.28.3 cluster on Ubuntu 22.04 nodes. I had just one of my nodes having this issue as well. I noticed that I had a high number of pod restarts for a deployment on only that node having issues. After resolving the issue with my deployment and restarting my node that had issues everything has been working fine. I suspect that my excessive pod restarts caused too many files to remain open.

I know that others have discussed ulimits. I just wanted to point out that my k3s processes are running as a systemd service because Ubuntu has switched to systemd from SysV. I know systemd runs each service with it's own open file limits and ignores ulimit. Here is a link to Red Hat documentation, but will likely pertain to whatever Linux distribution is using systemd. I hope someone else finds this helpful.

Dennor · Answer 11 · Sun May 05 2024 19:19:37 GMT+0800 (China Standard Time)

I'll just add few notes from my case. For me, after setting all the open file limits (systemd, pam, etc) as permissive as possible, I was still running into exact same error. Turns out it was caused by fs.inotify.max_user_instances. On my system that one was quite low. Bumping it helped.