Soft delete server on downscale

Question

Soft delete server on downscale

aniketmaurya opened this issue 2 years ago · comments

In the case of downscaling the servers, we don't give any grace time to process the backlogs it might have.
This results in the loss of some client requests.

A better way would be to soft delete the server first, wait for a minute, and finally, stop the LightningWork.

Soft delete: just remove the server from the LoadBalancer.servers list but don't stop the ModelServing work.

Here is how Kubernetes do it.

IMO, the best way to do this would be overriding the on_exit method.