The api server should respond status code 503 when it is busy
juntao opened this issue · comments
Michael Yuan commented
Summary
The api server is designed to work on one LLM job at a time. But since the LLM results are returned asynchronously, the server can respond to additional incoming requests while it is waiting for the LLM. In this case, it should respond status code 503 to all incoming requests until the LLM job completes.
Appendix
No response