dagger / dagger

An engine to run your pipelines in containers

Home Page:https://dagger.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🐞 Using dagger with start service sometimes loses its TCP connection

kjuulh opened this issue Β· comments

What is the issue?

We're using service containers as a form of basic local up for our developers. This means that we've got a basic set of features available during either development or a local go test ./... that can use these databases.

client.
     Container().
     From("rabbitmq").
     WithExposedPorts(...)
     AsService().
     Start(ctx)

What we've experienced is that these tests run fine once, but on the second run they cannot form a connection with the service containers. We also setup postgres, and you cannot run psql to connect to said database again.

We've found no logs in to highlight that there is a degradation in the connection.

Dagger version

dagger v0.10.1 (registry.dagger.io/engine) darwin/arm64

Steps to reproduce

We sadly cannot share the code explicitly because it is private, I can probably reproduce it in a minimal reproducible example if needed.

However, we basically just need to start a service container, and then in a separate process open a lot of connections to said service via. localhost, and do that over an over until the connection degrades, sometimes it takes a few attempts.

Log output

We got no log output to suggest what went wrong.

service container

32: start docker-entrypoint.sh rabbitmq-server
32: > in service rpgb0ibugd8em
32: [20.1s] 2024-04-25 07:37:33.052228+00:00 [info] <0.841.0> accepting AMQP connection <0.841.0> (10.87.0.1:33074 -> 10.87.0.19:5672)
32: [20.1s] 2024-04-25 07:37:33.055628+00:00 [info] <0.841.0> connection <0.841.0> (10.87.0.1:33074 -> 10.87.0.19:5672): user 'lunar' authenticated and granted access to vhost '/'
WARNING: conn read error error="read tcp [::1]:5432->[::1]:63545: read: connection reset by peer"
WARNING: conn read error error="read tcp [::1]:5432->[::1]:63702: read: connection reset by peer"
WARNING: conn read error error="read tcp [::1]:5432->[::1]:64165: read: connection reset by peer"
WARNING: conn write error error="write tcp [::1]:5672->[::1]:63331: write: broken pipe"
32: [50.1s] 2024-04-25 07:38:03.079579+00:00 [info] <0.841.0> closing AMQP connection <0.841.0> (10.87.0.1:33074 -> 10.87.0.19:5672, vhost: '/', user: 'lunar')
32: [71.2s] 2024-04-25 07:38:24.189303+00:00 [info] <0.865.0> accepting AMQP connection <0.865.0> (10.87.0.1:49040 -> 10.87.0.19:5672)
32: [71.2s] 2024-04-25 07:38:24.192665+00:00 [info] <0.865.0> connection <0.865.0> (10.87.0.1:49040 -> 10.87.0.19:5672): user 'lunar' authenticated and granted access to vhost '/'
WARNING: conn read error error="read tcp [::1]:5432->[::1]:64988: read: connection reset by peer"
WARNING: conn read error error="read tcp [::1]:5432->[::1]:65168: read: connection reset by peer"
WARNING: conn read error error="read tcp [::1]:5432->[::1]:65086: read: connection reset by peer"

engine

time="2024-04-25T07:37:18Z" level=debug msg="Starting new container for 6ph1b82egugpj7j8l2m7tafer with args: [\"docker-entrypoint.sh\" \"postgres\"]"
time="2024-04-25T07:37:18Z" level=debug msg="creating new network namespace fumtaz0cwu4391si181ckz945"
time="2024-04-25T07:37:18Z" level=debug msg="finished creating network namespace fumtaz0cwu4391si181ckz945"
time="2024-04-25T07:37:18Z" level=debug msg="Starting new container for fz7t0lkzaiorsmk07vlvedwt9 with args: [\"check\" \"skd1d114fbjga.6i0fpbd99ci3g.dagger.local\" \"5432/tcp\"]"
time="2024-04-25T07:37:18Z" level=debug msg="returning network namespace tbn4gj3k75v2n1p95416mh6wj from pool"
time="2024-04-25T07:37:18Z" level=debug msg="> creating fz7t0lkzaiorsmk07vlvedwt9 [check skd1d114fbjga.6i0fpbd99ci3g.dagger.local 5432/tcp]"
dnsmasq[35]: read /var/run/containers/cni/dnsname/dagger/addnhosts - 3 names
time="2024-04-25T07:37:18Z" level=debug msg="finished setting up network namespace fumtaz0cwu4391si181ckz945"
time="2024-04-25T07:37:18Z" level=debug msg="> creating 6ph1b82egugpj7j8l2m7tafer [docker-entrypoint.sh postgres]"
time="2024-04-25T07:37:20Z" level=debug msg="session call done" client_call_digest= client_hostname=Kaspers-MBP-2.localdomain client_id=dei5chfcgot883qm2tch2wh8f server_id=3h06plss4crr4bgtqx6hkndd0 spanID=f8aebb6f9254a635 traceID=99614924813587bc4b207489c15b08b4
time="2024-04-25T07:37:20Z" level=debug msg="handling session call" client_call_digest= client_hostname=Kaspers-MBP-2.localdomain client_id=dei5chfcgot883qm2tch2wh8f register_client=false server_id=3h06plss4crr4bgtqx6hkndd0 spanID=92c78ab55db51e2f traceID=e25602b10db79f703ffcb1622c3d3281
time="2024-04-25T07:37:20Z" level=debug msg="forwarding client to server" client_call_digest= client_hostname=Kaspers-MBP-2.localdomain client_id=dei5chfcgot883qm2tch2wh8f server_id=3h06plss4crr4bgtqx6hkndd0 spanID=92c78ab55db51e2f traceID=e25602b10db79f703ffcb1622c3d3281
time="2024-04-25T07:37:25Z" level=debug msg="session call done" client_call_digest= client_hostname=Kaspers-MBP-2.localdomain client_id=dei5chfcgot883qm2tch2wh8f server_id=3h06plss4crr4bgtqx6hkndd0 spanID=4ebb30438b61162c traceID=79f9f1c7a15b0de71027185e685d527f
time="2024-04-25T07:37:25Z" level=debug msg="handling session call" client_call_digest= client_hostname=Kaspers-MBP-2.localdomain client_id=dei5chfcgot883qm2tch2wh8f register_client=false server_id=3h06plss4crr4bgtqx6hkndd0 spanID=1a67c4f8fb872a25 traceID=c8f0a4054b33c5fa7784fd6752282e7f
time="2024-04-25T07:37:25Z" level=debug msg="forwarding client to server" client_call_digest= client_hostname=Kaspers-MBP-2.localdomain client_id=dei5chfcgot883qm2tch2wh8f server_id=3h06plss4crr4bgtqx6hkndd0 spanID=1a67c4f8fb872a25 traceID=c8f0a4054b33c5fa7784fd6752282e7f
time="2024-04-25T07:38:07Z" level=debug msg="engine metrics" cpu-count=10 cpu-idle=226125545 cpu-iowait=12891 cpu-irq=0 cpu-nice=0 cpu-softirq=44145 cpu-steal=0 cpu-system=336522 cpu-total=227403229 cpu-user=884126 dagger-server-count=1 disk-available-/=75965255680 disk-available-/var/lib/dagger=75965255680 disk-free-/=81581993984 disk-free-/var/lib/dagger=81581993984 disk-size-/=110044049408 disk-size-/var/lib/dagger=110044049408 goroutine-count=165 loadavg-1=0.87 loadavg-15=0.1 loadavg-5=0.29 mem-active=2528231424 mem-available=12679118848 mem-buffers=840466432 mem-cached=4569395200 mem-committed=3948711936 mem-free=6813241344 mem-inactive=3905118208 mem-mapped=748650496 mem-page-tables=9674752 mem-shmem=385974272 mem-slab=1232470016 mem-swap-cached=29573120 mem-swap-free=1034706944 mem-swap-total=1073737728 mem-total=14646792192 mem-vmalloc-used=21434368 proc-self-mem-anonymous=31412224 proc-self-mem-private-clean=38178816 proc-self-mem-private-dirty=31412224 proc-self-mem-pss=69591040 proc-self-mem-referenced=69595136 proc-self-mem-rss=69595136 proc-self-mem-shared-clean=4096 proc-self-mem-shared-dirty=0 proc-self-mem-swap=0 proc-self-mem-swap-pss=0 server-3h06plss4crr4bgtqx6hkndd0-client-count=3 uptime=63h16m34s
time="2024-04-25T07:39:07Z" level=debug msg="engine metrics" cpu-count=10 cpu-idle=226182841 cpu-iowait=12895 cpu-irq=0 cpu-nice=0 cpu-softirq=44372 cpu-steal=0 cpu-system=337603 cpu-total=227462396 cpu-user=884685 dagger-server-count=1 disk-available-/=75964829696 disk-available-/var/lib/dagger=75964829696 disk-free-/=81581568000 disk-free-/var/lib/dagger=81581568000 disk-size-/=110044049408 disk-size-/var/lib/dagger=110044049408 goroutine-count=167 loadavg-1=0.8 loadavg-15=0.14 loadavg-5=0.39 mem-active=2529685504 mem-available=12661223424 mem-buffers=840466432 mem-cached=4572168192 mem-committed=3958726656 mem-free=6794358784 mem-inactive=3914629120 mem-mapped=748720128 mem-page-tables=10043392 mem-shmem=388321280 mem-slab=1235222528 mem-swap-cached=29573120 mem-swap-free=1034706944 mem-swap-total=1073737728 mem-total=14646792192 mem-vmalloc-used=21499904 proc-self-mem-anonymous=32137216 proc-self-mem-private-clean=38178816 proc-self-mem-private-dirty=32137216 proc-self-mem-pss=70316032 proc-self-mem-referenced=70320128 proc-self-mem-rss=70320128 proc-self-mem-shared-clean=4096 proc-self-mem-shared-dirty=0 proc-self-mem-swap=0 proc-self-mem-swap-pss=0 server-3h06plss4crr4bgtqx6hkndd0-client-count=3 uptime=63h17m34s

our application

2024-04-25T09:38:24.209+0200    error   transport/consumer.go:90        [amqp] Consumer Go routine stopped with an error: initialize consumer on exchange integrationevents routing keys [identity.identity.IdentityDeleted]: declare exchange 'integrationevents': Exception (504) Reason: "channel/connection is not open"  {"log_type": "app", "template": "[amqp] Consumer Go routine stopped with an error: %v"}
go.lunarway.com/amqp/transport.(*Consumer).StartConsumer.func1
        /Users/kah/go/pkg/mod/go.lunarway.com/amqp@v1.4.5/transport/consumer.go:90
2024-04-25T09:38:24.209+0200    info    transport/consumer.go:92        [amqp] Consumer Go routine stopped      {"log_type": "app", "template": "[amqp] Consumer Go routine stopped"}
2024-04-25T09:38:34.208+0200    error   vostok@v1.2.4/logger.go:13      [vostok] Shutdown of components timed out (timeout: 10s). Exiting now!  {"log_type": "app", "template": "[vostok] Shutdown of components timed out (timeout: %s). Exiting now!"}

As you can tell after a few attempts we simply cannot form the tcp connection with the rabbitmq service container. It may be because there isn't enough tcp connections available though. But I do see runs run through successfully once.