containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`networks` related code is racy

apostasie opened this issue · comments

Description

For some reason, network tests are very racy on my rig.

Just taking network_remove_linux_test, I get several different conditions quite fast:

  • task xyz not found: not found
  • reading /etc/cni/net.d/nerdctl-nerdctl-testnetworkremovebyid.conflist: open /etc/cni/net.d/nerdctl-nerdctl-testnetworkremovebyid.conflist: no such file or directory
  • failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time=\"2024-06-13T03:36:47Z\" level=fatal msg=\"no such network: \\\"nerdctl-testnetworkprune\\\"\"\nFailed to write to log, write /var/lib/nerdctl/1935db59/containers/nerdctl-testnetworkprune/9c8747e53f2fa1cba99145d19f48648a85d63ccf5ae7167038a3536336c2d6aa/oci-hook.startContainer.log: file already closed: unknown

From a cursory reading, it feels to me like number 1 is somewhere in netutils making assumptions about the availability of objects.

Number 2 is probably also in netutils - seems to me like racyness between checking that a network exist and a later operation that depends on reading the config.
Could be that for certain operation we do not use filelock (properly).

Number 3 is more worrisome.

Steps to reproduce the issue

go test

Describe the results you received and expected

Fail 1 out of 10 times with a variety of different reasons.

What version of nerdctl are you using?

1.7.6

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response