[BUG]: Connection issue on biredis01
markovamaria opened this issue · comments
Describe the bug
After some time of work redis-benchmark-coordinator (run w/o supervisorctl) failed with Connection error.
Time of work is different and took 2 hours, 3 hours in few experiments.
Log is attached - test3.log
To Reproduce
Steps to reproduce the behavior:
- Run redis-becnhmark-coordinator w/o supervisor ctl
# redis-benchmarks-spec-sc-coordinator --platform-name intel64-ubuntu20.04-biredis --event_stream_host benchmarks.redislabs.com --event_stream_port 12010 --event_stream_pass <redacted> --event_stream_user default --datasink_push_results_redistimeseries --datasink_redistimeseries_host benchmarks.redislabs.com --datasink_redistimeseries_port 12011 --datasink_redistimeseries_pass <redacted> --logname /var/opt/test2.log --redis_proc_start_port 6379 --cpuset_start_pos 0 --docker-air-gap --consumer-id
- On remote server trigger testing.
redis-benchmarks-spec-cli --use-branch --from-date 2023-02-03 --redis_port 12010 --redis_host benchmarks.redislabs.com --redis_pass ***
Expected behavior
No fails are expected.
Screenshots/CLI snippets
0benchmark time-series [00:00, ?benchmark time-series/s]
100%|############...###########| 35/35 [00:10<00:00, 3.47benchmark time-series/s]
100%|#######...################| 15/15 [00:04<00:00, 3.46benchmark time-series/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/redis/connection.py", line 824, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/usr/local/lib/python3.8/dist-packages/redis/connection.py", line 467, in read_response
self.read_from_socket()
File "/usr/local/lib/python3.8/dist-packages/redis/connection.py", line 421, in read_from_socket
bufflen = self._sock.recv_into(self._buffer)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/redis-benchmarks-spec-sc-coordinator", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/redis_benchmarks_specification/__self_contained_coordinator__/self_contained_coordinator.py", line 228, in main
_, stream_id, _, _ = self_contained_coordinator_blocking_read(
File "/usr/local/lib/python3.8/dist-packages/redis_benchmarks_specification/__self_contained_coordinator__/self_contained_coordinator.py", line 277, in self_contained_coordinator_blocking_read
newTestInfo = conn.xreadgroup(
File "/usr/local/lib/python3.8/dist-packages/redis/commands/core.py", line 3849, in xreadgroup
return self.execute_command("XREADGROUP", *pieces)
File "/usr/local/lib/python3.8/dist-packages/redis/client.py", line 1238, in execute_command
return conn.retry.call_with_retry(
File "/usr/local/lib/python3.8/dist-packages/redis/retry.py", line 49, in call_with_retry
fail(error)
File "/usr/local/lib/python3.8/dist-packages/redis/client.py", line 1242, in <lambda>
lambda error: self._disconnect_raise(conn, error),
File "/usr/local/lib/python3.8/dist-packages/redis/client.py", line 1228, in _disconnect_raise
raise error
File "/usr/local/lib/python3.8/dist-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/usr/local/lib/python3.8/dist-packages/redis/client.py", line 1239, in <lambda>
lambda: self._send_command_parse_response(
File "/usr/local/lib/python3.8/dist-packages/redis/client.py", line 1215, in _send_command_parse_response
return self.parse_response(conn, command_name, **options)
File "/usr/local/lib/python3.8/dist-packages/redis/client.py", line 1254, in parse_response
response = connection.read_response()
File "/usr/local/lib/python3.8/dist-packages/redis/connection.py", line 830, in read_response
raise ConnectionError(f"Error while reading from {hosterr}" f" : {e.args}")
redis.exceptions.ConnectionError: Error while reading from benchmarks.redislabs.com:12010 : (104, 'Connection reset by peer')
Environment (please complete the following information):
Run on biredis01
It appears that this may be related to OS settings for mainting a TCP connection alive.
After setting:
echo "Set TCP keepalive"
echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
It seems to have fixed the issue (2 weeks+ without a disconnect).
It appears that this may be related to OS settings for mainting a TCP connection alive. After setting: echo "Set TCP keepalive" echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
It seems to have fixed the issue (2 weeks+ without a disconnect).
closing this one given the above feedback.
TODO: document on Readme so that if someone faces the same issues they will have a helping set of tunning configurations