[Bug]: Postgres connection intermittently fails with: Connection refused
gbd3-en opened this issue · comments
Testcontainers version
3.8.0
Using the latest Testcontainers version?
Yes
Host OS
MacOS
Host arch
arm64 (M1/Apple Silicon)
.NET version
8.0.201
Docker version
Client:
Version: 24.0.7-rd
API version: 1.42 (downgraded from 1.43)
Go version: go1.20.10
Git commit: 72ffacf
Built: Wed Nov 1 18:41:50 2023
OS/Arch: darwin/arm64
Context: default
Server:
Engine:
Version: 23.0.6
API version: 1.42 (minimum version 1.12)
Go version: go1.20.11
Git commit: 9dbdbd4b6d7681bd18c897a6ba0376073c2a72ff
Built: Fri Nov 17 20:59:57 2023
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: v1.7.2
GitCommit: 0cae528dd6cb557f7201036e9f43420650207b58
runc:
Version: 1.1.12
GitCommit: 51d5e94601ceffbbd85688df1c928ecccbfa4685
docker-init:
Version: 0.19.0
GitCommit:
Docker info
Client:
Version: 24.0.7-rd
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.0
Path: /Users/myuser/.docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.23.3
Path: /Users/myuser/.docker/cli-plugins/docker-compose
Server:
Containers: 7
Running: 2
Paused: 0
Stopped: 5
Images: 30
Server Version: 23.0.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 0cae528dd6cb557f7201036e9f43420650207b58
runc version: 51d5e94601ceffbbd85688df1c928ecccbfa4685
init version:
Security Options:
seccomp
Profile: builtin
Kernel Version: 6.1.75-0-virt
Operating System: Alpine Linux v3.18
OSType: linux
Architecture: aarch64
CPUs: 2
Total Memory: 3.826GiB
Name: lima-rancher-desktop
ID: TOEU:WHHK:4PR6:MRPD:LK4X:PHRT:DPZK:STPC:6DYN:CXWF:OA6O:V7BH
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
What happened?
Problem description
Good afternoon,
I am trying to spin up Postgres container and connect from a unit test. It intermittently fails with SocketException: Connection refused
.
The connection succeeds if I:
- run the tests in Debug mode
- add a delay after
StartAsync()
public sealed class PostgresContainerTests : IAsyncLifetime {
private readonly PostgreSqlContainer _postgres = new PostgreSqlBuilder()
// removing the WithImage does not fix the issue
.WithImage("postgres:14.6")
.Build();
public async Task InitializeAsync() {
ConsoleLogger.Instance.DebugLogLevelEnabled = true;
DotNet.Testcontainers.Configurations.TestcontainersSettings.Logger = ConsoleLogger.Instance;
await _postgres.StartAsync();
// TODO: tests sometimes fail with "Failed to connect",
// but always work if I run them in Debug mode
// OR when adding a few seconds of delay
//await Task.Delay(5000);
}
}
Test output
Output from dotnet test
:
A total of 1 test files matched the specified pattern.
[testcontainers.org 00:00:00.04] Connected to Docker:
Host: unix:///var/run/docker.sock
Server Version: 23.0.6
Kernel Version: 6.1.75-0-virt
API Version: 1.42
Operating System: Alpine Linux v3.18
Total Memory: 3.83 GB
[testcontainers.org 00:00:00.11] Docker container a6c524bd4523 created
[testcontainers.org 00:00:00.13] Start Docker container a6c524bd4523
[testcontainers.org 00:00:00.37] Wait for Docker container a6c524bd4523 to complete readiness checks
[testcontainers.org 00:00:00.37] Docker container a6c524bd4523 ready
[testcontainers.org 00:00:00.39] Docker container 74b317f809f8 created
[testcontainers.org 00:00:00.40] Start Docker container 74b317f809f8
[testcontainers.org 00:00:00.67] Wait for Docker container 74b317f809f8 to complete readiness checks
[testcontainers.org 00:00:00.67] Execute "pg_isready --host localhost --dbname postgres --username postgres" at Docker container 74b317f809f8
[testcontainers.org 00:00:01.79] Execute "pg_isready --host localhost --dbname postgres --username postgres" at Docker container 74b317f809f8
[testcontainers.org 00:00:01.89] Docker container 74b317f809f8 ready
[testcontainers.org 00:00:02.27] Delete Docker container 74b317f809f8
[xUnit.net 00:00:02.63] MyApi.Tests.PostgresContainerTests.CanCreateAndListABlog [FAIL]
Failed MyApi.Tests.PostgresContainerTests.CanCreateAndListABlog [398 ms]
Error Message:
Npgsql.NpgsqlException : Failed to connect to 127.0.0.1:32813
---- System.Net.Sockets.SocketException : Connection refused
My environment:
- Mac OS Ventura on Apple Silicon / arm64
- .NET 8
- Rancher Desktop v 1.12.3
Questions
This happened both in TestContainers.PostgreSql
3.7.0
and 3.80
which seem to use different wait strategies (1) (2).
It makes me wonder if the wait strategy is being ignored?
Any thoughts on where I can look next? Thanks!
Relevant log output
No response
Additional information
No response
My first assumption was that the new wait strategy caused the issue. However, since you are experiencing the same issue with the previous version, I doubt that this is the case. Based on past experiences (with other container runtimes than Docker), we noticed that the chosen host port is not (always) immediately available or linked. I guess you are running into a similar issue, and this is not something we can address or fix in Testcontainers: rancher-sandbox/rancher-desktop#3141 (I think on macOS you can try to use Testcontainers Desktop or the following configuration to set up Rancher Desktop).
Thanks for the Rancher issue link!
I am unable to reproduce outside of Rancher Desktop on MacOS (I tried Windows/Rancher, MacOS/Docker Desktop, Linux/Moby.) This confirms it is the slow port availability.
I've added this code to workaround it:
public sealed class PostgresContainerTests : IAsyncLifetime {
private readonly PostgreSqlContainer _postgres = new PostgreSqlBuilder()
.WithImage("postgres:14.6")
.Build();
public async Task InitializeAsync() {
await _postgres.StartAsync();
await _postgres.WaitForPort(); // <--- workaround
}
}
public static class TestcontainerWorkaround {
public static Task<bool> WaitForPort(this PostgreSqlContainer container, TimeSpan? maxWait = null) {
return WaitForPort(container, PostgreSqlBuilder.PostgreSqlPort, maxWait ?? TimeSpan.FromSeconds(10));
}
public static async Task<bool> WaitForPort(this DockerContainer container, int unmappedPort, TimeSpan maxWait) {
var ips = await Dns.GetHostAddressesAsync(container.Hostname);
if (ips.Length != 1) {
throw new ArgumentException($"Expected 1 IP to resolve from '{container.Hostname}', but got {ips.Length}");
}
int portNumber = container.GetMappedPublicPort(unmappedPort);
CancellationTokenSource ts = new();
ts.CancelAfter(maxWait);
using var tcpClient = new TcpClient();
while (!ts.IsCancellationRequested) {
try {
await tcpClient.ConnectAsync(ips[0], portNumber, ts.Token);
return true;
}
catch (SocketException) { }
await Task.Delay(500, ts.Token);
}
return false;
}
}
Hi I am trying to use same thing the workaround worked well in local environment, but in azure pipeline it is giving 'Expcted 1 IP to resolve from but got 2 ips "exception. how to solve this please help.