Lightning-Universe / stable-diffusion-deploy

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

Home Page:https://lightning.ai/muse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Soft delete server on downscale

aniketmaurya opened this issue · comments

In the case of downscaling the servers, we don't give any grace time to process the backlogs it might have.
This results in the loss of some client requests.

A better way would be to soft delete the server first, wait for a minute, and finally, stop the LightningWork.

Soft delete: just remove the server from the LoadBalancer.servers list but don't stop the ModelServing work.

Here is how Kubernetes do it.

IMO, the best way to do this would be overriding the on_exit method.