minio / minio

The Object Store for AI Data Infrastructure

Home Page:https://min.io/download

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

High Memory usage after recent upgrade.

prasadkris opened this issue · comments

NOTE

If this case is urgent, please subscribe to Subnet so that our 24/7 support team may help you faster.

Expected Behavior

No intermittent memory spikes in the pods.

Current Behavior

We have a minio clusters running on our self-hosted Kubernetes cluster, deployed using the Bitnami chart. Generally, it works great, but we upgraded from release 2023.11.1 to 2024.4.18 on April 19th, and since then, we've observed a significant increase in memory usage in the pods. The usage is not uniform across the pods, and there are frequent spikes in some of them, which leads to the pods being frequently OOMKilled. The only workaround is to increase the resource allocation, which is not good.

We can clearly see from the below Grafana screenshot that the pod's memory usage was fine until April 19th, on which the upgrade was performed. Afterwards, the usage increased with intermittent spikes. We have increased the resource limit a couple of times, but it seems like the usage keeps growing. The only way to stop it is either to restart the pod or keep increasing the resource limit

vendath

Possible Solution

Steps to Reproduce (for bugs)

Deploy MinIO using the Bitnami chart version 12.8.19 first and observe the memory usage; there won't be any issues. Then upgrade to chart version 14.1.8 and observe the usage; you should see abnormal usage

Context

The pods frequently get OOM-killed; the only way out is to increase memory allocation.

Regression

I am not sure.

Your Environment

  • Version used (minio --version): DEVELOPMENT.2024-04-18T19-09-19Z (commit-id=98f7821eb3f60a6ece3125348a121a1238d02159)
  • Server setup and configuration:
 mc admin info minio
●  minio-0.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-1.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-2.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-3.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-4.minio-headless.minio.svc.cluster.local:9000
   Uptime: 10 hours 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-5.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-6.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

●  minio-7.minio-headless.minio.svc.cluster.local:9000
   Uptime: 1 day 
   Version: 2024-04-18T19:09:19Z
   Network: 8/8 OK 
   Drives: 1/1 OK 
   Pool: 1

Pools:
   1st, Erasure sets: 1, Drives per erasure set: 8

100 GiB Used, 18 Buckets, 297,821 Objects
8 drives online, 0 drives offline
  • Operating System and version (uname -a): Linux - 5.15.0-91-generic

How about CPU metric ? @prasadkris

Memory pre-allocation was changed between your two versions. It will use 75% of what it sees as available or 4GB if it cannot be detected on Linux from /sys/fs/cgroup/memory.max or /sys/fs/cgroup/memory/memory.limit_in_bytes.

Also note we don't support DEVELOPEMENT "releases" - this is not an official release.

Thanks @klauspost - Just for clarification, I am running MinIO as a deployment in Kubernetes, and we have set a memory limit of 8GB. In this case, MinIO should only preallocate around 6GB (75% of 8GB). However, it appears that in my case, it consumes/preallocates more than that, leading to OOMs

That is the preallocations - and it does match up with your graph. Running requests will cause additional allocations.

Ok, Thank you 🙏🏻 So, if I understand correctly, we need to set the memory limit in such a manner, considering 75 percent will be preallocated and then add a buffer for running requests? For example, if the maximum memory use before the upgrade was 2.5G, we need to set 10G as the limit considering 7.5G will be preallocated?