The Stirling Socket Tracer fails to record relevant request and response entries when the response is excessively large

Question

The Stirling Socket Tracer fails to record relevant request and response entries when the response is excessively large

hengyoush opened this issue 3 months ago · comments

Describe the bug
I have built Stirling and used Stirling_wrapper for testing. I use the redis-benchmark to sent two consecutive get commands. The value corresponding to this get command is a large value, around 3MB. I can see the requests and responses in tcpdump, but they are not visible in the output of Stirling_wrapper. However, for smaller values, in the range of a few tens of bytes, I can observe them in the Stirling_wrapper output.

redis-benchmark command I executed, and I have pre-created the key and large value.:

redis-benchmark -h {ip}  -r 3 -c 1 -n 2 get __rand_int__

tcpdump：

I can see the requests and responses in tcpdump.

and redis-benchmark's output:

====== get __rand_int__ ======                                         
  2 requests completed in 0.57 seconds
....

the two request all succeeded.

and the Stirling_wrapper's output:

[redis_events] time_:[2024-03-03 20:19:02] upid:[{0:500714:1082513724}] remote_addr:[10.0.4.2] remote_port:[6379] local_addr:[-] local_port:[-1] trace_role:[2] req_cmd:[SAVE] req_args:[["3600 1 300 100 60 10000"]] resp:[] latency:[0 seconds] px_info_:[conn_tracker=conn_id=[upid=500714:1082513724 fd=4 gen=10825137247545168] state=kTransferring remote_addr=10.0.4.2:6379 role=kRoleClient protocol=kProtocolRedis ssl=false record=req=[base=[timestamp_ns=1709497142140662188 byte_size=31] payload=[["3600 1 300 100 60 10000"]] command=SAVE] resp=[base=[timestamp_ns=1709497142140662188 byte_size=0] payload=[] command=] role_swapped=true]
[redis_events] time_:[2024-03-03 20:19:02] upid:[{0:500714:1082513724}] remote_addr:[10.0.4.2] remote_port:[6379] local_addr:[-] local_port:[-1] trace_role:[1] req_cmd:[CONFIG GET] req_args:[{"parameter":"appendonly"}] resp:[["appendonly","no"]] latency:[0.000650927 seconds] px_info_:[conn_tracker=conn_id=[upid=500714:1082513724 fd=4 gen=10825137247545168] state=kTransferring remote_addr=10.0.4.2:6379 role=kRoleClient protocol=kProtocolRedis ssl=false record=req=[base=[timestamp_ns=1709497142140011261 byte_size=36] payload=[{"parameter":"appendonly"}] command=CONFIG GET] resp=[base=[timestamp_ns=1709497142140662188 byte_size=19] payload=[["appendonly","no"]] command=] role_swapped=false]
[dns_events] time_:[2024-03-03 20:19:15] upid:[{0:2606493:423948407}] remote_addr:[183.60.83.19] remote_port:[53] local_addr:[-] local_port:[-1] trace_role:[1] req_header:[{"txid":64014,"qr":0,"opcode":0,"aa":0,"tc":0,"rd":1,"ra":0,"ad":0,"cd":0,"rcode":0,"num_queries":1,"num_answers":0,"num_auth":0,"num_addl":0}] req_body:[{"queries":[{"name":"s3-r-w.eu-west-3.amazonaws.com","type":"AAAA"}]}] resp_header:[{"txid":64014,"qr":1,"opcode":0,"aa":0,"tc":0,"rd":1,"ra":1,"ad":0,"cd":0,"rcode":0,"num_queries":1,"num_answers":0,"num_auth":1,"num_addl":0}] resp_body:[{"answers":[]}] latency:[0.00119556 seconds] px_info_:[conn_tracker=conn_id=[upid=2606493:423948407 fd=17 gen=10825150425737512] state=kTransferring remote_addr=183.60.83.19:53 role=kRoleClient protocol=kProtocolDNS ssl=false record=req=[header={"txid":64014,"qr":0,"opcode":0,"aa":0,"tc":0,"rd":1,"ra":0,"ad":0,"cd":0,"rcode":0,"num_queries":1,"num_answers":0,"num_auth":0,"num_addl":0} quI20240304 04:19:16.718147 500951 uprobe_manager.cc:985] Number of uprobes deployed = 27
I20240304 04:19:21.739650 501002 uprobe_manager.cc:985] Number of uprobes deployed = 27
I20240304 04:19:24.350064 501034 uprobe_manager.cc:985] Number of uprobes deployed = 9

the above two redis command is not the commands i sent, Those commands were sent by the Redis-benchmark tool.

How to reproduce？
I believe that reproducing this issue should be possible by constructing two extremely large (exceeding 3MB) key-value pairs and then sending consecutive get requests.

my machine environment：
Linux VM-4-9-ubuntu 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

and redis version：
Redis server v=7.2.4

Stirling version:
just building from source(main branch)

Dom Delnano · Answer 1 · Tue Mar 05 2024 15:25:09 GMT+0800 (China Standard Time)

@hengyoush thanks for the detailed bug report. There are some well known issues that cause missing protocol traces as payload size increases (#1755 for example). I need to spend some time trying your instructions and investigating the source of the issue and hope to do that over the next few days.