openssl / openssl

TLS/SSL and crypto library

Home Page:https://www.openssl.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[3.4] 70-test_quic_tserver.t transient failure on NonStop in thread model

rsbeckerca opened this issue · comments

I am experiencing a transient error in 70-test_quic_tserver.t. When running without V=0, the test will fail as follows:

70-test_quic_tserver.t ..................
        # ERROR: (bool) 'CRYPTO_THREAD_write_lock(fake_time_lock) == true' failed @ /home/ituglib/randall/openssl-klt/test/quic_tserver_test.c:314
        # false
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # 0000000000010DAB00000000000AA4F0:error:80001005:system library:dgram_recvmmsg:Operation would block:/home/ituglib/randall/openssl-klt/crypto/bio/bss_dgram.c:1742:
        # OPENSSL_TEST_RAND_SEED=1717508956
        not ok 8 - iteration 8
# ------------------------------------------------------------------------------
    # OPENSSL_TEST_RAND_SEED=1717508956
    not ok 1 - test_tserver
# ------------------------------------------------------------------------------
../../util/wrap.pl ../../test/quic_tserver_test ../../test/certs/servercert.pem ../../test/certs/serverkey.pem => 255
not ok 1
70-test_quic_tserver.t .................. 1/? ----------------------------------
#   Failed test at test/recipes/70-test_quic_tserver.t line 19.
70-test_quic_tserver.t .................. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests

This is repeatable until I run the test with V=1, where it then passes and continues to pass even without V=1. After a make clean, the situation returns. I am suspecting a missing required thread context switch that would happen when output is generated in V=1. This is non-critical as the test does pass, even if it takes manual intervention to approve it.

Any pointers on where I can look for this?

the code is trying to call recvfrom on a non-blocking socket and there is no data avaiilable.

First guess is that you're building with a threading model that maps recvfrom to a non-thread aware recvfrom call (spt_recvfrom vs spt_recvfromx), and as a result the receive thread is spinning forever and not yielding to allow the sending thread to submit data to the socket. The introduction of V=1 likely introduces console message output that fortunately yields the cpu, allowing forward progress to be made.

the code is trying to call recvfrom on a non-blocking socket and there is no data avaiilable.

First guess is that you're building with a threading model that maps recvfrom to a non-thread aware recvfrom call (spt_recvfrom vs spt_recvfromx), and as a result the receive thread is spinning forever and not yielding to allow the sending thread to submit data to the socket. The introduction of V=1 likely introduces console message output that fortunately yields the cpu, allowing forward progress to be made.

We're actually dealing with a new kernel threading model - so not cooperative threading. It is looking like we have some timeouts causing transient issues. This is going back to development.

ok, thank you for the update. Should this then be close pending the outcome of your new internal development?

ok, thank you for the update. Should this then be close pending the outcome of your new internal development?

May I keep it open until I hear back? It should be a few weeks.

please, of course. I'll leave it in investigation state to follow up on.