failed to get sandbox runtime: no runtime for "spin" is configured (vanilla Kubernetes)

Question

failed to get sandbox runtime: no runtime for "spin" is configured (vanilla Kubernetes)

hotspoons opened this issue 10 months ago · comments

Rich Siomporas commented 10 months ago

I have the spin shim installed on my worker nodes:

And the spin containerd plugin configured on my worker nodes:

The runtime class configured on my cluster:

This is the deployment config for the hello world app I was trying to get working:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"wasm-spin","namespace":"default"},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"wasm-spin"}},"template":{"metadata":{"labels":{"app":"wasm-spin"}},"spec":{"containers":[{"image":"ghcr.io/deislabs/containerd-wasm-shims/examples/spin-rust-hello:latest","name":"testwasm"}],"runtimeClassName":"wasmtime-spin-v1"}}}}
  creationTimestamp: "2023-10-11T01:42:09Z"
  generation: 1
  name: wasm-spin
  namespace: default
  resourceVersion: "75957"
  uid: 3104ba94-3ceb-496c-b7b3-23e6472500f3
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: wasm-spin
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: wasm-spin
    spec:
      containers:
      - image: ghcr.io/deislabs/containerd-wasm-shims/examples/spin-rust-hello:latest
        imagePullPolicy: Always
        name: testwasm
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      runtimeClassName: wasmtime-spin-v1
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: "2023-10-11T01:42:09Z"
    lastUpdateTime: "2023-10-11T01:42:09Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2023-10-11T01:42:09Z"
    lastUpdateTime: "2023-10-11T01:56:03Z"
    message: ReplicaSet "wasm-spin-58db6df759" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 1
  replicas: 3
  unavailableReplicas: 3
  updatedReplicas: 3

But this is what I get when trying to deploy any of the pods:

Did I miss something? I tried with the fermyon helm chart and binaries from here, same result. Any help is appreciated. Thanks!

Jiaxiao Zhou · Answer 1 · Sat Oct 21 2023 14:46:50 GMT+0800 (China Standard Time)

Sorry for the late reply. Given the screenshots I am not entirely sure what the issue was. Hence a few questions from my side to aid debugging:

Could you please check if the shim binary is indeed in the PATH of the worker nodes where containerd can find?
Have you restarted containerd after you changed it's config.toml?
Can you please also check containerd logs to see if there are any interesging stuff? Please paste it here if you found something interesting.

Rich Siomporas · Answer 2 · Mon Oct 23 2023 11:27:33 GMT+0800 (China Standard Time)

Sorry for the late reply. Given the screenshots I am not entirely sure what the issue was. Hence a few questions from my side to aid debugging:

Could you please check if the shim binary is indeed in the PATH of the worker nodes where containerd can find?

Have you restarted containerd after you changed it's config.toml?

Can you please also check containerd logs to see if there are any interesging stuff? Please paste it here if you found something interesting.

Thank you for the help! I dug into the containerd logs as you recommended and that led me down a rabbit hole that ended me up on the Nvidia GPU operator GitHub issues page, where I found this issue, and ultimately following the Configure cgroups section from this guide made the wasm shim work for my configuration.

Since this issue will be present on any out-of-the-box Enterprise Linux variant (e.g. Redhat, CentOS, Rocky, Alma) Kubernetes cluster, I would be happy to open a PR to add a note of warning to your documentation, hopefully it will save frustration for anyone who comes after me. Thank you!

Jiaxiao Zhou · Answer 3 · Tue Oct 24 2023 02:31:16 GMT+0800 (China Standard Time)

I would be happy to open a PR to add a note of warning to your documentation

That would be great! I am extremely happy to hear that you were able to figure out the issue and made it work! 🥳