`mlx-ui` pod fails to start up on OpenShift
ckadner opened this issue · comments
Describe the bug
After deploying MLX on OpenShift (4.8, 4.10 on either IBM Cloud or Fyre)
# export MLX_DEPLOYMENT_TYPE=mlx-single-ibmcloud-openshift
export MLX_DEPLOYMENT_TYPE=mlx-single-fyre-openshift
git clone https://github.com/IBM/manifests -b v1.5-branch && cd manifests
while ! kustomize build ${MLX_DEPLOYMENT_TYPE} | \
kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
The mlx-ui
pod fails to start up:
NAME READY STATUS RESTARTS AGE
cache-deployer-deployment-798dc7d98b-9c4sj 1/1 Running 0 83s
cache-server-86f59c8696-d499g 0/1 ContainerCreating 0 83s
kfp-csi-s3-4srhx 0/2 ContainerCreating 0 81s
kfp-csi-s3-9bqft 0/2 ContainerCreating 0 81s
kfp-csi-s3-gklqr 0/2 ContainerCreating 0 81s
metadata-envoy-deployment-5b4856dd5-m6t4m 1/1 Running 0 83s
metadata-grpc-deployment-6b5685488-gnszx 1/1 Running 0 83s
metadata-writer-9f698fdcb-x47pd 1/1 Running 0 83s
minio-5b65df66c9-d257k 1/1 Running 0 83s
ml-pipeline-77b7b79565-p2wfq 1/1 Running 0 83s
ml-pipeline-persistenceagent-684f664fb7-q255d 1/1 Running 0 83s
ml-pipeline-scheduledworkflow-5dfcf96788-6mp2n 1/1 Running 0 82s
ml-pipeline-ui-6dfcc5c664-pkgbr 1/1 Running 0 82s
ml-pipeline-viewer-crd-5878c6454f-mk92c 1/1 Running 0 82s
ml-pipeline-visualizationserver-6876996cdd-s4qvd 1/1 Running 0 82s
mlx-api-7f46b6df4f-xdvzw 1/1 Running 0 82s
mlx-ui-7fbbbf6cbb-hll4z 0/1 Error 3 82s
mysql-f7b9b7dd4-75l2q 1/1 Running 0 82s
We can see exit code 243
in oc describe pod mlx-ui-7fbbbf6cbb-hll4z
:
Containers:
mlx-ui:
Container ID: cri-o://5d9d1caa2f3544a78c8b0e2cdc9cba9fc495a7c108ee3443220b417ca8c55d4b
Image: mlexchange/mlx-ui:nightly-origin-main
Image ID: docker.io/mlexchange/mlx-ui@sha256:70aa61ce62caeeeeaa549420c4684b5e0edb3dc96a8151b11f15939c5fe14152
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 243
After deleting the mlx-ui
pod, the mlx-ui
comes up fine:
$ oc get pods | grep mlx-ui
mlx-ui-7fbbbf6cbb-hll4z 0/1 CrashLoopBackOff 7 13m
$ oc delete pod mlx-ui-7fbbbf6cbb-hll4z
pod "mlx-ui-7fbbbf6cbb-hll4z" deleted
$ oc get pods | grep mlx-ui
mlx-ui-7fbbbf6cbb-r5kxh 1/1 Running 0 16s
Thanks @jbusche for verifying this error to be consistent across various OC deployments