The Stirling Socket Tracer fails to record relevant request and response entries when the response is excessively large
hengyoush opened this issue · comments
Describe the bug
I have built Stirling and used Stirling_wrapper for testing. I use the redis-benchmark to sent two consecutive get commands. The value corresponding to this get command is a large value, around 3MB. I can see the requests and responses in tcpdump, but they are not visible in the output of Stirling_wrapper. However, for smaller values, in the range of a few tens of bytes, I can observe them in the Stirling_wrapper output.
redis-benchmark command I executed, and I have pre-created the key and large value.:
redis-benchmark -h {ip} -r 3 -c 1 -n 2 get __rand_int__
tcpdump:
I can see the requests and responses in tcpdump.
and redis-benchmark's output:
====== get __rand_int__ ======
2 requests completed in 0.57 seconds
....
the two request all succeeded.
and the Stirling_wrapper's output:
[redis_events] time_:[2024-03-03 20:19:02] upid:[{0:500714:1082513724}] remote_addr:[10.0.4.2] remote_port:[6379] local_addr:[-] local_port:[-1] trace_role:[2] req_cmd:[SAVE] req_args:[["3600 1 300 100 60 10000"]] resp:[] latency:[0 seconds] px_info_:[conn_tracker=conn_id=[upid=500714:1082513724 fd=4 gen=10825137247545168] state=kTransferring remote_addr=10.0.4.2:6379 role=kRoleClient protocol=kProtocolRedis ssl=false record=req=[base=[timestamp_ns=1709497142140662188 byte_size=31] payload=[["3600 1 300 100 60 10000"]] command=SAVE] resp=[base=[timestamp_ns=1709497142140662188 byte_size=0] payload=[] command=] role_swapped=true]
[redis_events] time_:[2024-03-03 20:19:02] upid:[{0:500714:1082513724}] remote_addr:[10.0.4.2] remote_port:[6379] local_addr:[-] local_port:[-1] trace_role:[1] req_cmd:[CONFIG GET] req_args:[{"parameter":"appendonly"}] resp:[["appendonly","no"]] latency:[0.000650927 seconds] px_info_:[conn_tracker=conn_id=[upid=500714:1082513724 fd=4 gen=10825137247545168] state=kTransferring remote_addr=10.0.4.2:6379 role=kRoleClient protocol=kProtocolRedis ssl=false record=req=[base=[timestamp_ns=1709497142140011261 byte_size=36] payload=[{"parameter":"appendonly"}] command=CONFIG GET] resp=[base=[timestamp_ns=1709497142140662188 byte_size=19] payload=[["appendonly","no"]] command=] role_swapped=false]
[dns_events] time_:[2024-03-03 20:19:15] upid:[{0:2606493:423948407}] remote_addr:[183.60.83.19] remote_port:[53] local_addr:[-] local_port:[-1] trace_role:[1] req_header:[{"txid":64014,"qr":0,"opcode":0,"aa":0,"tc":0,"rd":1,"ra":0,"ad":0,"cd":0,"rcode":0,"num_queries":1,"num_answers":0,"num_auth":0,"num_addl":0}] req_body:[{"queries":[{"name":"s3-r-w.eu-west-3.amazonaws.com","type":"AAAA"}]}] resp_header:[{"txid":64014,"qr":1,"opcode":0,"aa":0,"tc":0,"rd":1,"ra":1,"ad":0,"cd":0,"rcode":0,"num_queries":1,"num_answers":0,"num_auth":1,"num_addl":0}] resp_body:[{"answers":[]}] latency:[0.00119556 seconds] px_info_:[conn_tracker=conn_id=[upid=2606493:423948407 fd=17 gen=10825150425737512] state=kTransferring remote_addr=183.60.83.19:53 role=kRoleClient protocol=kProtocolDNS ssl=false record=req=[header={"txid":64014,"qr":0,"opcode":0,"aa":0,"tc":0,"rd":1,"ra":0,"ad":0,"cd":0,"rcode":0,"num_queries":1,"num_answers":0,"num_auth":0,"num_addl":0} quI20240304 04:19:16.718147 500951 uprobe_manager.cc:985] Number of uprobes deployed = 27
I20240304 04:19:21.739650 501002 uprobe_manager.cc:985] Number of uprobes deployed = 27
I20240304 04:19:24.350064 501034 uprobe_manager.cc:985] Number of uprobes deployed = 9
the above two redis command is not the commands i sent, Those commands were sent by the Redis-benchmark tool.
How to reproduce?
I believe that reproducing this issue should be possible by constructing two extremely large (exceeding 3MB) key-value pairs and then sending consecutive get requests.
my machine environment:
Linux VM-4-9-ubuntu 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
and redis version:
Redis server v=7.2.4
Stirling version:
just building from source(main branch)
@hengyoush thanks for the detailed bug report. There are some well known issues that cause missing protocol traces as payload size increases (#1755 for example). I need to spend some time trying your instructions and investigating the source of the issue and hope to do that over the next few days.