Automatically remove sessions in Pending state
pameladelgado opened this issue · comments
Pamela Delgado commented
Sessions that have been stuck in a Pending/Errored state for a threshold amount of time should be removed automatically.
Andreas Bleuler commented
That's a good point. I think we can also define a certain number of restarts (eg 10?) which would lead to deletion.
Tasko Olevski commented
A bit more details and requirements:
- there should be a flag in the values file to enable or disable this
- the limits for how long a session should be in a "stuck" state should also be modifiable - see the implementation for culling - this is just an alternative way of culling
- these stuck states are
failed
orstarting
from the choices in the state enum
In a bit more detail (and almost identical to the culling), this would work like this:
- add a parameter to the "culling" section of the crd that contains the limit of how long a session should be in
starting
state before it is culled - add another section to the culling section of the crd that indicates how long a session can be in the
failed
state before it is culled - in the status section of the jupyterserver manifest add two fields
startingSince
andfailedSince
which contains an iso8601 timestamp of the time (in UTC) when the session entered the state - update the kopf session state handler to write the two timestamps in the session status
- add another kopf timer similar to the culling timer that checks
status.failedSince
on the manifest compares this toculling.maxFailedAge
and if conditions are right it removes the server - repeat the above for the "starting" state - you can re-use the same kopf timer or add a separate one
Tips:
- all state (if any is needed) is in the manifest
- a value of 0 for the culling thresholds means that the session will not be culled for that specific case - it is the same for culling based on idleness