LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge

Home Page:https://llamaedge.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The api server should respond status code 503 when it is busy

juntao opened this issue · comments

Summary

The api server is designed to work on one LLM job at a time. But since the LLM results are returned asynchronously, the server can respond to additional incoming requests while it is waiting for the LLM. In this case, it should respond status code 503 to all incoming requests until the LLM job completes.

Appendix

No response