knative-extensions / serving-progressive-rollout

Knative Serving extension to roll out the revision progressively

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

During a service update, delayed resource release causes new revision pods to encounter errors and restart until resources are freed.

AyushSawant18588 opened this issue · comments

When updating a service, the previous revision pods terminate, but resources like GPUs take time to free up. As a result, new revision pods initially encounter CUDA out of memory errors because the GPUs are still occupied with the model weights from the previous revision. Consequently, the new revision pods restart several times and take a few minutes to reach a running state.

Is this expected behavior for this scenario?

This is normal. All resources, including CPUs, Memory and GPUs, etc, are still taken up by the old revisions, as long as they still exist. Even if they are in terminating status, they occupy resources, before they are gone.
New revisions have to wait till the resources are freed up to get enough resources. It is normal to see them retrying and restarting.