failed to get sandbox runtime: no runtime for "spin" is configured (vanilla Kubernetes)
hotspoons opened this issue · comments
I have the spin shim installed on my worker nodes:
And the spin containerd plugin configured on my worker nodes:
The runtime class configured on my cluster:
This is the deployment config for the hello world app I was trying to get working:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"wasm-spin","namespace":"default"},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"wasm-spin"}},"template":{"metadata":{"labels":{"app":"wasm-spin"}},"spec":{"containers":[{"image":"ghcr.io/deislabs/containerd-wasm-shims/examples/spin-rust-hello:latest","name":"testwasm"}],"runtimeClassName":"wasmtime-spin-v1"}}}}
creationTimestamp: "2023-10-11T01:42:09Z"
generation: 1
name: wasm-spin
namespace: default
resourceVersion: "75957"
uid: 3104ba94-3ceb-496c-b7b3-23e6472500f3
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: wasm-spin
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: wasm-spin
spec:
containers:
- image: ghcr.io/deislabs/containerd-wasm-shims/examples/spin-rust-hello:latest
imagePullPolicy: Always
name: testwasm
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
runtimeClassName: wasmtime-spin-v1
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2023-10-11T01:42:09Z"
lastUpdateTime: "2023-10-11T01:42:09Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2023-10-11T01:42:09Z"
lastUpdateTime: "2023-10-11T01:56:03Z"
message: ReplicaSet "wasm-spin-58db6df759" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
observedGeneration: 1
replicas: 3
unavailableReplicas: 3
updatedReplicas: 3
But this is what I get when trying to deploy any of the pods:
Did I miss something? I tried with the fermyon helm chart and binaries from here, same result. Any help is appreciated. Thanks!
Sorry for the late reply. Given the screenshots I am not entirely sure what the issue was. Hence a few questions from my side to aid debugging:
- Could you please check if the shim binary is indeed in the PATH of the worker nodes where containerd can find?
- Have you restarted containerd after you changed it's
config.toml
? - Can you please also check containerd logs to see if there are any interesging stuff? Please paste it here if you found something interesting.
Sorry for the late reply. Given the screenshots I am not entirely sure what the issue was. Hence a few questions from my side to aid debugging:
- Could you please check if the shim binary is indeed in the PATH of the worker nodes where containerd can find?
- Have you restarted containerd after you changed it's
config.toml
?- Can you please also check containerd logs to see if there are any interesging stuff? Please paste it here if you found something interesting.
Thank you for the help! I dug into the containerd logs as you recommended and that led me down a rabbit hole that ended me up on the Nvidia GPU operator GitHub issues page, where I found this issue, and ultimately following the Configure cgroups
section from this guide made the wasm shim work for my configuration.
Since this issue will be present on any out-of-the-box Enterprise Linux variant (e.g. Redhat, CentOS, Rocky, Alma) Kubernetes cluster, I would be happy to open a PR to add a note of warning to your documentation, hopefully it will save frustration for anyone who comes after me. Thank you!
I would be happy to open a PR to add a note of warning to your documentation
That would be great! I am extremely happy to hear that you were able to figure out the issue and made it work! 🥳