kserve / modelmesh

Distributed Model Serving Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Log reason for unready model mesh container

Legion2 opened this issue · comments

I'm currently debugging constantly unready model mesh containers, which do not directly indicate the problem in the logs. Only through reading the code I discovered abortStartup, which fails the ready probe without logging the reason

if (abortStartup) {
return false;
}

ModelMesh.abortStartup indicates an unrecoverable failure, which can only be resolved with an restart of model mesh container.
If the readiness probe fails because of abortStartup it should be logged, to allow debugging the root cause of the issue.

Alternatively, the Liveness probe of the container should fail to indicate an unrecoverable application failure.

@Legion2 -- would you like to propose a code change in a PR?