deislabs / containerd-wasm-shims

containerd shims for running WebAssembly workloads in Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

v0.11.0 images pull had `ErrImagePull` error

Mossaka opened this issue · comments

  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  7m25s                   default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
  Normal   Scheduled         7m23s                   default-scheduler  Successfully assigned default/wasm-lunatic-764c4f46d4-5b4zt to k3d-wasm-cluster-agent-1
  Normal   Pulling           5m51s (x4 over 7m22s)   kubelet            Pulling image "ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0"
  Warning  Failed            5m51s (x4 over 7m22s)   kubelet            Failed to pull image "ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0": rpc error: code = NotFound desc = failed to pull and unpack image "ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0": no match for platform in manifest: not found
  Warning  Failed            5m51s (x4 over 7m22s)   kubelet            Error: ErrImagePull
  Warning  Failed            5m38s (x6 over 7m21s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff           2m15s (x20 over 7m21s)  kubelet            Back-off pulling image "ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0"

I think I would have to change the platform back to amd64 and arm64 for images that we publish.

https://github.com/deislabs/containerd-wasm-shims/blob/main/.github/workflows/docker-build-push.yaml#L79

What do you think? @devigned , @jsturtevant

I believe you are correct about the origin of the error, but I'm not sure the platform / arch. @jsturtevant wdyt?

It looks like this is because it is build as an image index by buildx

 regctl manifest get ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0
Name:                            ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0
MediaType:                       application/vnd.oci.image.index.v1+json
Digest:                          sha256:40c27dda0433770fb84cf8404a0904288e557db74dda2e42d6f70efd09aba82f

Manifests:

  Name:                          ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0@sha256:cd20f3be0de911ad8eb7551ef4ccdfc16040ad35a47f7633786b80d3a6b76598
  Digest:                        sha256:cd20f3be0de911ad8eb7551ef4ccdfc16040ad35a47f7633786b80d3a6b76598
  MediaType:                     application/vnd.oci.image.manifest.v1+json
  Platform:                      wasi/wasm

  Name:                          ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0@sha256:f587a7a300a7ec1b7f0c404bca38c337dfb79e6f2b93238daac25f963e5008c6
  Digest:                        sha256:f587a7a300a7ec1b7f0c404bca38c337dfb79e6f2b93238daac25f963e5008c6
  MediaType:                     application/vnd.oci.image.manifest.v1+json
  Platform:                      unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:cd20f3be0de911ad8eb7551ef4ccdfc16040ad35a47f7633786b80d3a6b76598
    vnd.docker.reference.type:   attestation-manifest

Switching it back to amd64 for the platform arch would allow this to be pulled as an index, otherwise I believe disabling attestation in buildx would produce a single image manifest instead of an index and it would pull properly.

The other option would be to use the digest directly like: ghcr.io/deislabs/containerd-wasm-shims/examples/lunatic-submillisecond:v0.11.0@sha256:cd20f3be0de911ad8eb7551ef4ccdfc16040ad35a47f7633786b80d3a6b76598

Can I ask that in the future, if there are problems with a release, we follow convention and issue a new patch release?

Tags and releases are generally considered write-once. There are excellent reasons for not replacing an existing release in-line, even if it's completely broken. For example: this PR which passed initially but began failing once v0.11.0 suddenly meant something different:
kubernetes-sigs/image-builder#1405

It was minutes from merging, and would have been DOA if we had.

I can update it to match new SHAs, but TBH I have low confidence that will work, and my incentive to keep up with releases is reduced. (This isn't the first time a wasm-shims release has been updated inline with new binaries.)

Is there an additional test or release gate we could add to make sure this doesn't happen in the future? I'd love to help!

Also it appears the v0.11.0 release is incomplete now: it only has three of the expected eight binary packages.

image

Can I ask that in the future, if there are problems with a release, we follow convention and issue a new patch release?

My bad. Yes, of course.

Is there an additional test or release gate we could add to make sure this doesn't happen in the future? I'd love to help!

I think, in the future, given that there is a high probability that a release would fail, I am going to push release candidates first and verify they work and then push the main tag. Does this sound like a plausible approach?

Does this sound like a plausible approach?

It does indeed, sorry there's so much manual work involved.

(I'm also planning to write an end-to-end test to verify that the wasm-shims are working in the real world--Cluster API for Azure--but that wouldn't catch problems until much later.)

Just released v0.11.1