[Feature]: Health check for restart policy
pseudotensor opened this issue Β· comments
PSEUDOTENSOR / Jonathan McKinney commented
π The feature, motivation and pitch
A small change to get alot of reliability back....
We see vLLM crash or hang in various ways, e.g.:
And manually managing that is a hassle.
vLLM team could easily add a HEALTHCHECK line in the Dockerfile so tools like autoheal can function.
https://hub.docker.com/r/willfarrell/autoheal/
https://docs.docker.com/reference/dockerfile/#healthcheck
Would looks like:
HEALTHCHECK --interval=5m --timeout=10s curl -f http://localhost/health || exit 1
This would allow one to use the other docker image to manage the vLLM images.
Alternatives
Manual labor
Additional context
No response
Robert Shaw commented
Can you open a PR?