High Memory usage after recent upgrade.

Question

High Memory usage after recent upgrade.

prasadkris opened this issue 2 months ago · comments

Prasad Krishnan commented 2 months ago

NOTE

If this case is urgent, please subscribe to Subnet so that our 24/7 support team may help you faster.

Expected Behavior

No intermittent memory spikes in the pods.

Current Behavior

We have a minio clusters running on our self-hosted Kubernetes cluster, deployed using the Bitnami chart. Generally, it works great, but we upgraded from release 2023.11.1 to 2024.4.18 on April 19th, and since then, we've observed a significant increase in memory usage in the pods. The usage is not uniform across the pods, and there are frequent spikes in some of them, which leads to the pods being frequently OOMKilled. The only workaround is to increase the resource allocation, which is not good.

We can clearly see from the below Grafana screenshot that the pod's memory usage was fine until April 19th, on which the upgrade was performed. Afterwards, the usage increased with intermittent spikes. We have increased the resource limit a couple of times, but it seems like the usage keeps growing. The only way to stop it is either to restart the pod or keep increasing the resource limit

Possible Solution

Steps to Reproduce (for bugs)

Deploy MinIO using the Bitnami chart version 12.8.19 first and observe the memory usage; there won't be any issues. Then upgrade to chart version 14.1.8 and observe the usage; you should see abnormal usage

Context

The pods frequently get OOM-killed; the only way out is to increase memory allocation.

Regression

I am not sure.

Your Environment

Version used (minio --version): DEVELOPMENT.2024-04-18T19-09-19Z (commit-id=98f7821eb3f60a6ece3125348a121a1238d02159)
Server setup and configuration:

 mc admin info minio
●  minio-0.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-1.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-2.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-3.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-4.minio-headless.minio.svc.cluster.local:9000
   Uptime: 10 hours 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-5.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-6.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-7.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

Pools:
   1st, Erasure sets: 1, Drives per erasure set: 8

100 GiB Used, 18 Buckets, 297,821 Objects
8 drives online, 0 drives offline

Operating System and version (uname -a): Linux - 5.15.0-91-generic

jiuker · Answer 1 · Wed Apr 24 2024 18:45:50 GMT+0800 (China Standard Time)

How about CPU metric ? @prasadkris

Klaus Post · Answer 2 · Wed Apr 24 2024 19:05:25 GMT+0800 (China Standard Time)

Memory pre-allocation was changed between your two versions. It will use 75% of what it sees as available or 4GB if it cannot be detected on Linux from /sys/fs/cgroup/memory.max or /sys/fs/cgroup/memory/memory.limit_in_bytes.

Also note we don't support DEVELOPEMENT "releases" - this is not an official release.

Prasad Krishnan · Answer 3 · Wed Apr 24 2024 19:32:27 GMT+0800 (China Standard Time)

Thanks @klauspost - Just for clarification, I am running MinIO as a deployment in Kubernetes, and we have set a memory limit of 8GB. In this case, MinIO should only preallocate around 6GB (75% of 8GB). However, it appears that in my case, it consumes/preallocates more than that, leading to OOMs

Klaus Post · Answer 4 · Wed Apr 24 2024 19:59:05 GMT+0800 (China Standard Time)

That is the preallocations - and it does match up with your graph. Running requests will cause additional allocations.

Prasad Krishnan · Answer 5 · Wed Apr 24 2024 20:21:51 GMT+0800 (China Standard Time)

Ok, Thank you 🙏🏻 So, if I understand correctly, we need to set the memory limit in such a manner, considering 75 percent will be preallocated and then add a buffer for running requests? For example, if the maximum memory use before the upgrade was 2.5G, we need to set 10G as the limit considering 7.5G will be preallocated?