If the request exceeds current job_queue_size then following error will be returned,
{'code': 503,'type': 'ServiceUnavailableException', 'message': 'Model "empty_model" has no worker to serve inference request. Please use scale workers API to add workers.'}
Q2. How to check model status(using /ping endpoint, new feature in torchserve 0.8.0)
Inference API(8080)'s /ping endpoint is used for frontend (java server)
Starting from torchserve 0.8.0, on model initialization, one can check if backend model initialization is failed or not.
Since torchserve automatically restarts backend worker if initialization failed, one can set new model-config.yml file to specify maxRetryTimeoutInSec value to timeout the retry attempt.
This helps because when torchserve started, the /ping endpoint always returns 200 healthy even model initialization fails.
But with new feature after maxRetryTimeoutInSec, server will return 500 Unhealthy if initialization fails.
Q3. How to check model status (Not using /ping endpoint)
Inference API(8080)'s /ping endpoint is used for frontend (java server)
use Management API(8081)'s model describe endpoint /models/<model-name>