Change plugin's `Reserve()` method so it cannot fail

Question

Change plugin's `Reserve()` method so it cannot fail

sharnoff opened this issue 4 months ago · comments

Problem description / Motivation

This seems counterintuitive, but: If there's issues where the scheduler plugin's internal state does not match the cluster (either because it's delayed, or consistent due to some bug), we can end up performing the correction action at the Filter step (i.e. allowing a Pod onto a node) but subsequently incorrectly rejecting it at the Reserve step.

When this kind of inconsistency happens, it's often made more severe by the fact that the scheduler framework does not take reserve failures into account when retrying scheduling something — if Filter and Reserve disagree, the Pod can get stuck repeatedly failing to be scheduled onto the same node, even if there's room elsewhere in the cluster.

In practice, we find we're much more likely to have Reserve failures because of bugs in our scheduler plugin (false positives) rather than racy resource acquisition (true positives).

Feature idea(s) / DoD

Change Reserve so that it cannot reject the pod.

Implementation ideas

This should be as simple as changing the value of a boolean passed to (*AutoscaleEnforcer).reserveResources()

This must be discussed internally before implementing — we should make sure that we can continue to detect issues that would have previously caused scheduling failures.

Stefan Radig · Answer 1 · Tue Mar 26 2024 23:13:29 GMT+0800 (China Standard Time)

We should change this as proposed but also add a metric which counts how many times the old version would failed the reservation.