During a service update, delayed resource release causes new revision pods to encounter errors and restart until resources are freed.

Question

During a service update, delayed resource release causes new revision pods to encounter errors and restart until resources are freed.

AyushSawant18588 opened this issue a month ago · comments

When updating a service, the previous revision pods terminate, but resources like GPUs take time to free up. As a result, new revision pods initially encounter CUDA out of memory errors because the GPUs are still occupied with the model weights from the previous revision. Consequently, the new revision pods restart several times and take a few minutes to reach a running state.

Is this expected behavior for this scenario?

Vincent · Answer 1 · Tue Jul 02 2024 01:30:17 GMT+0800 (China Standard Time)

This is normal. All resources, including CPUs, Memory and GPUs, etc, are still taken up by the old revisions, as long as they still exist. Even if they are in terminating status, they occupy resources, before they are gone.
New revisions have to wait till the resources are freed up to get enough resources. It is normal to see them retrying and restarting.