tlsfuzzer / tlsfuzzer

SSL and TLS protocol test suite and fuzzer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add means to set default socket timeout to `scripts_retention.py`

ueno opened this issue · comments

Feature request

Is your feature request related to a problem? Please describe

On a slow CI setup, some of the tests such as test-tls13-ffdhe-sanity.py integrated into GnuTLS test suite intermittently fail with:

test-tls13-ffdhe-sanity.py:stdout:    self._read_buffer += self.socket.recv(max(4096, bufsize))
test-tls13-ffdhe-sanity.py:stdout:socket.timeout: timed out
test-tls13-ffdhe-sanity.py:stdout:
test-tls13-ffdhe-sanity.py:stdout:During handling of the above exception, another exception occurred:
test-tls13-ffdhe-sanity.py:stdout:
test-tls13-ffdhe-sanity.py:stdout:Traceback (most recent call last):
test-tls13-ffdhe-sanity.py:stdout:  File "/builds/dueno/gnutls/tests/suite/tls-fuzzer/tlsfuzzer/scripts/test-tls13-ffdhe-sanity.py", line 215, in main
test-tls13-ffdhe-sanity.py:stdout:    runner.run()
test-tls13-ffdhe-sanity.py:stdout:  File "/builds/dueno/gnutls/tests/suite/tls-fuzzer/tlsfuzzer/tlsfuzzer/runner.py", line 221, in run
test-tls13-ffdhe-sanity.py:stdout:    raise AssertionError(
test-tls13-ffdhe-sanity.py:stdout:AssertionError: Timeout when waiting for peer message

Looks like some of the tests provide -t timeout option but for this particular test there is no way to prolong the timeout.

Describe the solution you'd like

It would be nice if the default socket timeout can be set at the single place, e.g., as an option to scripts_retention.py, maybe using socket.setdefaulttimeout.

Describe alternatives you've considered

It might also be an option to add -t timeout everywhere, though it would make the integration harder.

Additional context

Here is the link to the CI log.

The problem is that scripts_retention.py executes the scripts as normal executables (it doesn't import them to execute), so it doesn't have any special impact on the runtime of it, it can only change environment variables and/or command line options.

Using the socket.setdefaulttimeout won't work because the default is always overriden:

def __init__(self, hostname, port, version=(3, 0), timeout=5):
"""
Provide minimal settings needed to connect to other peer.
:param str hostname: host name of the server to connect to
:param int port: :term:`TCP` port number to connect to
:param tuple(int,int) version: the protocol version used in the
record layer for the initial messages
:param float timeout: amount of time to wait while expecting a message
before aborting the connection, in seconds
"""
super(Connect, self).__init__()
self.hostname = hostname
self.port = port
self.version = version
self.timeout = timeout
def process(self, state):
"""Connect to a server."""
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(self.timeout)

Adding support for -t timeout to all script would be the better. Not sure why you think integration would be harder, there's support for common_arguments to pass -t timeout to all scripts (including option to override it on a per-script basis), e.g.:

"common_arguments" : ["-k", "tests/clientX509Key.pem",
"-c", "tests/clientX509Cert.pem"],

there's support for common_arguments to pass -t timeout to all scripts (including option to override it on a per-script basis)

Oh, ok; I wasn't aware of that option.

Adding support for -t timeout to all script would be the better.

One more thing that might prevent this is that test-tls13-count-tickets.py uses -t for a different meaning (ticket count).

One more thing that might prevent this is that test-tls13-count-tickets.py uses -t for a different meaning (ticket count).

hmm, technically it's reserved for the timeout:

# -t timeout to wait for messages (also count of NSTs in
# test-tls13-count-tickets.py)

but yes, that script is the exception, we may need to break compatibility on it

I've been thinking a bit more about it, and I wonder if it really just wouldn't be better to change the timeout on that one (test-tls13-ffdhe-sanity.py) script (maybe even make its default larger in the script proper)...

The thing is that 5 seconds for a reply, especially on a localhost is quite generous, but this particular test runs through all the different FFDHE key sizes, and 8192 bit DH is really slow, so on an overwhelmed CI host I can imagine the calculation alone taking 5 seconds... At the same time, even the slowest ECDH should be faster by few orders of magnitude, same for 2048bit DH, so other scripts shouldn't really be impacted.

fixed by #782