testcontainers / testcontainers-dotnet

A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.

Home Page:https://dotnet.testcontainers.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Postgres connection intermittently fails with: Connection refused

gbd3-en opened this issue · comments

Testcontainers version

3.8.0

Using the latest Testcontainers version?

Yes

Host OS

MacOS

Host arch

arm64 (M1/Apple Silicon)

.NET version

8.0.201

Docker version

Client:
 Version:           24.0.7-rd
 API version:       1.42 (downgraded from 1.43)
 Go version:        go1.20.10
 Git commit:        72ffacf
 Built:             Wed Nov  1 18:41:50 2023
 OS/Arch:           darwin/arm64
 Context:           default

Server:
 Engine:
  Version:          23.0.6
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.20.11
  Git commit:       9dbdbd4b6d7681bd18c897a6ba0376073c2a72ff
  Built:            Fri Nov 17 20:59:57 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          v1.7.2
  GitCommit:        0cae528dd6cb557f7201036e9f43420650207b58
 runc:
  Version:          1.1.12
  GitCommit:        51d5e94601ceffbbd85688df1c928ecccbfa4685
 docker-init:
  Version:          0.19.0
  GitCommit:

Docker info

Client:
 Version:    24.0.7-rd
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.0
    Path:     /Users/myuser/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.23.3
    Path:     /Users/myuser/.docker/cli-plugins/docker-compose

Server:
 Containers: 7
  Running: 2
  Paused: 0
  Stopped: 5
 Images: 30
 Server Version: 23.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 0cae528dd6cb557f7201036e9f43420650207b58
 runc version: 51d5e94601ceffbbd85688df1c928ecccbfa4685
 init version: 
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 6.1.75-0-virt
 Operating System: Alpine Linux v3.18
 OSType: linux
 Architecture: aarch64
 CPUs: 2
 Total Memory: 3.826GiB
 Name: lima-rancher-desktop
 ID: TOEU:WHHK:4PR6:MRPD:LK4X:PHRT:DPZK:STPC:6DYN:CXWF:OA6O:V7BH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

What happened?

Problem description

Good afternoon,

I am trying to spin up Postgres container and connect from a unit test. It intermittently fails with SocketException: Connection refused.

The connection succeeds if I:

  • run the tests in Debug mode
  • add a delay after StartAsync()
public sealed class PostgresContainerTests : IAsyncLifetime {

    private readonly PostgreSqlContainer _postgres = new PostgreSqlBuilder()
        // removing the WithImage does not fix the issue
        .WithImage("postgres:14.6")
        .Build();
    
    public async Task InitializeAsync() {
        ConsoleLogger.Instance.DebugLogLevelEnabled = true;
        DotNet.Testcontainers.Configurations.TestcontainersSettings.Logger = ConsoleLogger.Instance;

        await _postgres.StartAsync();
        
        // TODO: tests sometimes fail with "Failed to connect", 
        //    but always work if I run them in Debug mode
        //    OR when adding a few seconds of delay
        //await Task.Delay(5000);
    }
 }

Test output

Output from dotnet test:

A total of 1 test files matched the specified pattern.
[testcontainers.org 00:00:00.04] Connected to Docker:
 Host: unix:///var/run/docker.sock
 Server Version: 23.0.6
 Kernel Version: 6.1.75-0-virt
 API Version: 1.42
 Operating System: Alpine Linux v3.18
 Total Memory: 3.83 GB
[testcontainers.org 00:00:00.11] Docker container a6c524bd4523 created
[testcontainers.org 00:00:00.13] Start Docker container a6c524bd4523
[testcontainers.org 00:00:00.37] Wait for Docker container a6c524bd4523 to complete readiness checks
[testcontainers.org 00:00:00.37] Docker container a6c524bd4523 ready
[testcontainers.org 00:00:00.39] Docker container 74b317f809f8 created
[testcontainers.org 00:00:00.40] Start Docker container 74b317f809f8
[testcontainers.org 00:00:00.67] Wait for Docker container 74b317f809f8 to complete readiness checks
[testcontainers.org 00:00:00.67] Execute "pg_isready --host localhost --dbname postgres --username postgres" at Docker container 74b317f809f8
[testcontainers.org 00:00:01.79] Execute "pg_isready --host localhost --dbname postgres --username postgres" at Docker container 74b317f809f8
[testcontainers.org 00:00:01.89] Docker container 74b317f809f8 ready
[testcontainers.org 00:00:02.27] Delete Docker container 74b317f809f8
[xUnit.net 00:00:02.63]     MyApi.Tests.PostgresContainerTests.CanCreateAndListABlog [FAIL]
 Failed MyApi.Tests.PostgresContainerTests.CanCreateAndListABlog [398 ms]
 Error Message:
  Npgsql.NpgsqlException : Failed to connect to 127.0.0.1:32813
---- System.Net.Sockets.SocketException : Connection refused

My environment:

  • Mac OS Ventura on Apple Silicon / arm64
  • .NET 8
  • Rancher Desktop v 1.12.3

Questions

This happened both in TestContainers.PostgreSql 3.7.0 and 3.80 which seem to use different wait strategies (1) (2).

It makes me wonder if the wait strategy is being ignored?

Any thoughts on where I can look next? Thanks!

Relevant log output

No response

Additional information

No response

My first assumption was that the new wait strategy caused the issue. However, since you are experiencing the same issue with the previous version, I doubt that this is the case. Based on past experiences (with other container runtimes than Docker), we noticed that the chosen host port is not (always) immediately available or linked. I guess you are running into a similar issue, and this is not something we can address or fix in Testcontainers: rancher-sandbox/rancher-desktop#3141 (I think on macOS you can try to use Testcontainers Desktop or the following configuration to set up Rancher Desktop).

Thanks for the Rancher issue link!

I am unable to reproduce outside of Rancher Desktop on MacOS (I tried Windows/Rancher, MacOS/Docker Desktop, Linux/Moby.) This confirms it is the slow port availability.

I've added this code to workaround it:

public sealed class PostgresContainerTests : IAsyncLifetime {
    private readonly PostgreSqlContainer _postgres = new PostgreSqlBuilder()
        .WithImage("postgres:14.6")
        .Build();

    public async Task InitializeAsync() {
        await _postgres.StartAsync();
        await _postgres.WaitForPort();  // <--- workaround
    }
}

public static class TestcontainerWorkaround {

    public static Task<bool> WaitForPort(this PostgreSqlContainer container, TimeSpan? maxWait = null) {
        return WaitForPort(container, PostgreSqlBuilder.PostgreSqlPort, maxWait ?? TimeSpan.FromSeconds(10));
    }

    public static async Task<bool> WaitForPort(this DockerContainer container, int unmappedPort, TimeSpan maxWait) {
        var ips = await Dns.GetHostAddressesAsync(container.Hostname);
        if (ips.Length != 1) {
            throw new ArgumentException($"Expected 1 IP to resolve from '{container.Hostname}', but got {ips.Length}");
        }

        int portNumber = container.GetMappedPublicPort(unmappedPort);

        CancellationTokenSource ts = new();
        ts.CancelAfter(maxWait);
        
        using var tcpClient = new TcpClient();

        while (!ts.IsCancellationRequested) {
            try {
                await tcpClient.ConnectAsync(ips[0], portNumber, ts.Token);
                return true;
            }
            catch (SocketException)  { }
            await Task.Delay(500, ts.Token); 
        }

        return false;
    }
}

Hi I am trying to use same thing the workaround worked well in local environment, but in azure pipeline it is giving 'Expcted 1 IP to resolve from but got 2 ips "exception. how to solve this please help.