Soft delete server on downscale
aniketmaurya opened this issue · comments
In the case of downscaling the servers, we don't give any grace time to process the backlogs it might have.
This results in the loss of some client requests.
A better way would be to soft delete the server first, wait for a minute, and finally, stop the LightningWork.
Soft delete: just remove the server from the LoadBalancer.servers
list but don't stop the ModelServing
work.
Here is how Kubernetes do it.
IMO, the best way to do this would be overriding the on_exit method.