vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Home Page:https://docs.vllm.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature]: Health check for restart policy

pseudotensor opened this issue Β· comments

πŸš€ The feature, motivation and pitch

A small change to get alot of reliability back....

We see vLLM crash or hang in various ways, e.g.:

#4108
#4344

And manually managing that is a hassle.

vLLM team could easily add a HEALTHCHECK line in the Dockerfile so tools like autoheal can function.

https://hub.docker.com/r/willfarrell/autoheal/
https://docs.docker.com/reference/dockerfile/#healthcheck

Would looks like:

HEALTHCHECK --interval=5m --timeout=10s curl -f http://localhost/health || exit 1

This would allow one to use the other docker image to manage the vLLM images.

Alternatives

Manual labor

Additional context

No response

@pseudotensor

Can you open a PR?