deislabs / containerd-wasm-shims

containerd shims for running WebAssembly workloads in Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

failed to get sandbox runtime: no runtime for "spin" is configured (vanilla Kubernetes)

hotspoons opened this issue · comments

I have the spin shim installed on my worker nodes:
image

And the spin containerd plugin configured on my worker nodes:
image

The runtime class configured on my cluster:
image

This is the deployment config for the hello world app I was trying to get working:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"wasm-spin","namespace":"default"},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"wasm-spin"}},"template":{"metadata":{"labels":{"app":"wasm-spin"}},"spec":{"containers":[{"image":"ghcr.io/deislabs/containerd-wasm-shims/examples/spin-rust-hello:latest","name":"testwasm"}],"runtimeClassName":"wasmtime-spin-v1"}}}}
  creationTimestamp: "2023-10-11T01:42:09Z"
  generation: 1
  name: wasm-spin
  namespace: default
  resourceVersion: "75957"
  uid: 3104ba94-3ceb-496c-b7b3-23e6472500f3
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: wasm-spin
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: wasm-spin
    spec:
      containers:
      - image: ghcr.io/deislabs/containerd-wasm-shims/examples/spin-rust-hello:latest
        imagePullPolicy: Always
        name: testwasm
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      runtimeClassName: wasmtime-spin-v1
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: "2023-10-11T01:42:09Z"
    lastUpdateTime: "2023-10-11T01:42:09Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2023-10-11T01:42:09Z"
    lastUpdateTime: "2023-10-11T01:56:03Z"
    message: ReplicaSet "wasm-spin-58db6df759" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 1
  replicas: 3
  unavailableReplicas: 3
  updatedReplicas: 3

But this is what I get when trying to deploy any of the pods:

image

Did I miss something? I tried with the fermyon helm chart and binaries from here, same result. Any help is appreciated. Thanks!

Sorry for the late reply. Given the screenshots I am not entirely sure what the issue was. Hence a few questions from my side to aid debugging:

  1. Could you please check if the shim binary is indeed in the PATH of the worker nodes where containerd can find?
  2. Have you restarted containerd after you changed it's config.toml?
  3. Can you please also check containerd logs to see if there are any interesging stuff? Please paste it here if you found something interesting.

Sorry for the late reply. Given the screenshots I am not entirely sure what the issue was. Hence a few questions from my side to aid debugging:

  1. Could you please check if the shim binary is indeed in the PATH of the worker nodes where containerd can find?
  2. Have you restarted containerd after you changed it's config.toml?
  3. Can you please also check containerd logs to see if there are any interesging stuff? Please paste it here if you found something interesting.

Thank you for the help! I dug into the containerd logs as you recommended and that led me down a rabbit hole that ended me up on the Nvidia GPU operator GitHub issues page, where I found this issue, and ultimately following the Configure cgroups section from this guide made the wasm shim work for my configuration.

Since this issue will be present on any out-of-the-box Enterprise Linux variant (e.g. Redhat, CentOS, Rocky, Alma) Kubernetes cluster, I would be happy to open a PR to add a note of warning to your documentation, hopefully it will save frustration for anyone who comes after me. Thank you!

I would be happy to open a PR to add a note of warning to your documentation

That would be great! I am extremely happy to hear that you were able to figure out the issue and made it work! 🥳