concourse / concourse

Concourse is a container-based continuous thing-doer written in Go.

Home Page:https://concourse-ci.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docker failed to start within 120 seconds

radusw opened this issue · comments

Summary

Docker times out.

Steps to reproduce

Upgrade to 7.11.2 from 7.9.1.

Expected results

Task image pulling should work.

Actual results

waiting for docker to come up...
waiting for docker to come up...
waiting for docker to come up...

...


waiting for docker to come up...
waiting for docker to come up...
time="2024-02-09T00:00:00.521833713Z" level=info msg="Starting up"
time="2024-02-09T00:00:00.522310049Z" level=info msg="containerd not running, starting managed containerd"
time="2024-02-09T00:00:00.522962574Z" level=info msg="started new containerd process" address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=3080
time="2024-02-09T00:00:00.540329464Z" level=info msg="starting containerd" revision=ae07eda36dd25f8a1b98dfbf587313b99c0190bb version=1.6.28
time="2024-02-09T00:00:00.554332913Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2024-02-09T00:00:00.554370012Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.554619668Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exec: \"modprobe\": executable file not found in $PATH \"\"): skip plugin" type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.554656334Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.554954671Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs\"..." error="path /scratch/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs (xfs) must be a btrfs filesystem to be used with the btrfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.554981081Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.554995051Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
time="2024-02-09T00:00:00.555004498Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.555033703Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.555125320Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.555231180Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /scratch/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2024-02-09T00:00:00.555243422Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2024-02-09T00:00:00.555257737Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2024-02-09T00:00:00.555265296Z" level=info msg="metadata content store policy set" policy=shared
time="2024-02-09T00:00:00.555352408Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2024-02-09T00:00:00.555365271Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
time="2024-02-09T00:00:00.555374226Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2024-02-09T00:00:00.555395594Z" level=info msg="loading plugin \"io.containerd.warning.v1.deprecations\"..." type=io.containerd.warning.v1
time="2024-02-09T00:00:00.555405878Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555416969Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555427365Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555437432Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555447604Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555464635Z" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555474671Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555484227Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555493653Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2024-02-09T00:00:00.555528728Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2024-02-09T00:00:00.555561482Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2024-02-09T00:00:00.555851601Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2024-02-09T00:00:00.555881577Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.555894311Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2024-02-09T00:00:00.555941655Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.555953338Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.555963997Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.555973947Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.555983762Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.555994084Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.556003797Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.556013948Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.556024832Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2024-02-09T00:00:00.556058127Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.556068724Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.556079229Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2024-02-09T00:00:00.556091285Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2024-02-09T00:00:00.556104562Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2024-02-09T00:00:00.556113304Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2024-02-09T00:00:00.556131980Z" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
time="2024-02-09T00:00:00.556362417Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
time="2024-02-09T00:00:00.556421365Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
time="2024-02-09T00:00:00.556465113Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2024-02-09T00:00:00.556486871Z" level=info msg="containerd successfully booted in 0.016665s"
time="2024-02-09T00:00:01.607322673Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
time="2024-02-09T00:00:01.607489913Z" level=info msg="Loading containers: start."
time="2024-02-09T00:00:01.607610903Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: , error: exec: \"modprobe\": executable file not found in $PATH"
time="2024-02-09T00:00:01.727751658Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2024-02-09T00:00:01.728102784Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
time="2024-02-09T00:00:01.728106603Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2024-02-09T00:00:02.083977519Z" level=info msg="Processing signal 'terminated'"
Docker failed to start within 120 seconds.

Additional context

Works fine on version 7.9.1. Might be related to #8860

Triaging info

  • Concourse version: 7.11.2
  • Browser (if applicable): n/a
  • Did this used to work? Yes, in 7.9.1

We're seeing this exact same issue on v7.8.3.

In our case, it works when running on the amazon/amzn2-ami-hvm-2.0.20240109.0-x86_64-ebs AMI, but it breaks when SSM Patch Manager updates the kernel to kernel-4.14.336-255.557.amzn2.x86_64.

If we launch new worker instances without the patch, image fetching works as expected, but as soon as Patch Manager patches the worker instances, image fetching breaks.

Yea, I can confirm that the patching was breaking the concourse workers. Just verified that this was the issue as well for us.

Patch Manager updated our concourse workers again this morning but it installed a newer kernel version kernel-4.14.336-256.559.amzn2.x86_64 and the concourse workers are working as expected.

It's possible that this new kernel version fixes the problem that we were running into, but we'll need to keep an eye on it to see if the issue comes back.

Similarly experienced this with PhotonOS 3 linux-esx variant of kernels, nf_tables module was missing. Using vanilla linux images was a workaround.

Closing this issue since the fix appears to be coming from kernel patches and it's not obvious what would need to be fixed within Concourse.