Promote node resource over-commitment to GA
caohe opened this issue · comments
Why is this needed?
In v0.4, we released the MVP version of node resource over-commitment and implemented some basic features.
In v0.5, we plan to make some enhancements to this function to bring it to GA status.
What would you like to be added?
- Dynamic over-commitment ratio adjustment: In order to make the amount of over-committed resources more accurate, we will combine long-term and short-term prediction algorithms to calculate the amount of resources that can be over-committed. #472
- Interference detection and mitigation: In order to avoid resource competition caused by over-commitment, we will introduce multi-dimensional interference detection strategies, including CPU load/usage, memory usage, the reclaiming rate of kswapd, etc. Furthermore, we will introduce multi-tiered mitigation measures, including scheduling prevention, eviction, etc. #518
- Compatibility with core binding: Prevent the bound cores from being over-committed to avoid scheduling too many CPU-bound Pods and causing the Pods to fail to start. #472