daeuniverse / dae

eBPF-based Linux high-performance transparent proxy solution.

Home Page:https://dae.v2raya.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[更新]升级到0.6.0rc1版本后无法代理

jchen290 opened this issue · comments

Checks

  • I have searched the existing issues
  • I have read the documentation
  • Is it your first time sumbitting an issue

Current Behavior

升级到0.6.0rc2版本后无法代理,退回0.5.1版本后一切正常,确认config.dae里的配置没有问题

Expected Behavior

使用0.6.0rc2版本可以正常代理

Steps to Reproduce

简单比较两个版本的日志,发现:

  1. 升级到0.6.0rc2后,routing里的分组策略似乎不起作用了,所有请求fallback到默认的proxy组
  2. 升级到0.6.0rc2后,请求中sniffed=后始终是空值(怀疑问题就在这里),退回0.5.1版本后,sniffed后有网址,代理上网正常

0.6.0
network=tcp4 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=

0.5.1
network=tcp4 outbound=proxy pid=0 pname= policy=min_moving_avg sniffed=play.googleapis.com

Environment

  • Dae 0.6.0rc2:
  • Debian12:

Anything else?

No response

Thanks for opening this issue!

commented

@jchen290 试试 0.6.0rc1 也有这个问题吗

是的,0.6.0rc1也是同样的问题

试试这个: #512

@jschwinger233 还是不行,同样的问题,节点测速都有结果,但是sniffed=后面还是空值,页面打不开

@jchen290 看看/etc/resolv.conf 内容

@ChanthMiao /etc/resolv.conf里没动,指向的是主路由ip
'''
nameserver 10.0.0.154
'''

@jchen290 换成任意一个公共dns试试,比如223.5.5.5

@mzz2017, @ChanthMiao , @jschwinger233 , 昨天升级到0.6.0 release版本后还是同样的问题,无法代理,试过换成公共DNS不起作用,问题的现象还是sniffed=后面始终是空值,退回到0.5.1后一切正常;另外也试过基于新版本的example.dae修改配置文件,还是同样的问题。

commented

@jchen290

  1. 可以看看 #539 修复了你的问题吗?
  2. 另外,我怀疑是 v0.6.0rc1 引入的机制导致 sniff 超过 100ms,你可以调高 sniffing_timeout 到 10s 来验证吗?

嗨, 我们怀疑可能是 MTU 丢包重传导致 100ms sniff 超时,我提供两种方法来排查。

方法一:抓包

如果你系统流量不大的话可以直接 tcpdump -ni dae0 -w dae0.pcap ,在抓包的同时触发一次 sniff 失败的请求。

如果系统流量较大,可以先 dig 看看 play.googleapis.com 能解析出哪些 ipv4,比如能解析出 142.251.175.95 和 142.251.12.95 的话就可以用 ip 过滤: tcpdump -ni dae0 -w dae0.pcap 'host 142.251.12.95 or host 142.251.175.95'

如果发生了重传,那么会反应在 dae0.pcap 抓包文件里。

方法二: dae trace

在 dae0.6+ 内置了 trace 子命令,你就当做抓包来用: dae trace -4 -p tcp --port 443 -o trace.log 。 trace 会生成很大量的日志,所以尽量缩短运行时间,触发一次 sniff 失败就可以 ctrl-c 打断(p.s. 进程终止比较慢)。

如果发生了 MTU drop,我们可以在 trace.log 里看到 SKB_DROP_REASON_PKT_TOO_BIG。

@jchen290

  1. 可以看看 fix: incidental packet drop and weird UDP state maintaining #539 修复了你的问题吗?
  2. 另外,我怀疑是 v0.6.0rc1 引入的机制导致 sniff 超过 100ms,你可以调高 sniffing_timeout 到 10s 来验证吗?
  1. 升级到最新的0.7.0rc1还是同样的问题。
  2. 尝试过调高sniffing_timeout到20s还是有问题

接下来准备按照jschwinger233的建议,抓包看看是否有MTU丢失情况。

@mzz2017 @jschwinger233 @douglarek

tcpdump不是很熟悉,麻烦帮忙看看是否有思路,尝试访问的是google.com, sniffing_timeout设为10s

reading from file dae0.pcap, link-type EN10MB (Ethernet), snapshot length 262144
20:23:31.357081 IP 52.113.194.132.https > 10.0.0.231.57414: Flags [.], ack 2742141872, win 501, length 0
20:23:31.869090 IP 20.198.162.78.https > 10.0.0.231.57415: Flags [.], ack 849498961, win 501, length 0
20:23:31.869564 IP 10.0.0.231.57415 > 20.198.162.78.https: Flags [.], ack 1, win 8209, length 0
20:23:35.197092 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57421: Flags [.], seq 145383854:145385294, ack 1810262108, win 501, options [nop,nop,sack 1 {496:531}], length 1440
20:23:35.197893 IP 10.0.0.231.57421 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 20020, win 8212, options [nop,nop,sack 1 {0:1440}], length 0
20:23:35.548031 IP 10.0.0.231.57425 > sfo03s26-in-f14.1e100.net.https: Flags [P.], seq 2330269262:2330269798, ack 3625082385, win 8212, length 536
20:23:35.548055 IP sfo03s26-in-f14.1e100.net.https > 10.0.0.231.57425: Flags [.], ack 536, win 501, length 0
20:23:38.146706 IP 10.0.0.231.57421 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 676:1073, ack 20020, win 8212, length 397
20:23:38.146720 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57421: Flags [.], ack 1, win 501, options [nop,nop,sack 2 {676:1073}{496:531}], length 0
20:23:38.558720 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [S], seq 3212874665, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
20:23:38.558743 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [S.], seq 1658564407, ack 3212874666, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
20:23:38.559391 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 1, win 8212, length 0
20:23:38.559889 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 1:1453, ack 1, win 8212, length 1452
20:23:38.559907 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [.], ack 1453, win 501, length 0
20:23:38.872037 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [S], seq 4257475131, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
20:23:38.872060 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [S.], seq 3342667260, ack 4257475132, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
20:23:38.872120 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [S], seq 1106725268, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
20:23:38.872129 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [S.], seq 808905219, ack 1106725269, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
20:23:38.872363 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [S], seq 1464066675, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
20:23:38.872374 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [S.], seq 1511011951, ack 1464066676, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
20:23:38.872452 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 1, win 8212, length 0
20:23:38.872505 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 1, win 8212, length 0
20:23:38.872657 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 1, win 8212, length 0
20:23:38.873134 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 1:1453, ack 1, win 8212, length 1452
20:23:38.873148 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [.], ack 1453, win 501, length 0
20:23:38.873653 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 1:1453, ack 1, win 8212, length 1452
20:23:38.873671 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [.], ack 1453, win 501, length 0
20:23:38.874140 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 1:1453, ack 1, win 8212, length 1452
20:23:38.874155 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [.], ack 1453, win 501, length 0
20:23:40.674591 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 1453:1989, ack 1, win 8212, length 536
20:23:40.674615 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [.], ack 1989, win 501, length 0
20:23:40.828967 IP dl-in-f84.1e100.net.https > 10.0.0.231.57416: Flags [.], ack 2822882291, win 501, length 0
20:23:40.829470 IP 10.0.0.231.57416 > dl-in-f84.1e100.net.https: Flags [.], ack 1, win 8212, length 0
20:23:41.006761 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [P.], seq 1453:1757, ack 1, win 8212, length 304
20:23:41.006764 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [P.], seq 1453:1789, ack 1, win 8212, length 336
20:23:41.006765 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [.], seq 1453:1989, ack 1, win 8212, length 536
20:23:41.006791 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [.], ack 1757, win 501, length 0
20:23:41.006801 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [.], ack 1789, win 501, length 0
20:23:41.006802 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [.], ack 1989, win 501, length 0
20:23:41.468702 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [P.], seq 1:5540, ack 1789, win 501, length 5539
20:23:41.468824 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [P.], seq 1:2701, ack 1757, win 501, length 2700
20:23:41.468875 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [P.], seq 2701:5539, ack 1757, win 501, length 2838
20:23:41.470352 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [F.], seq 1831, ack 5539, win 8212, length 0
20:23:41.470368 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [.], ack 1757, win 501, options [nop,nop,sack 1 {1831:1832}], length 0
20:23:41.470450 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [F.], seq 2243, ack 1, win 8212, length 0
20:23:41.470463 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [.], ack 1989, win 501, options [nop,nop,sack 1 {2243:2244}], length 0
20:23:41.470472 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [F.], seq 2211, ack 1, win 8212, length 0
20:23:41.470477 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [.], ack 1989, win 501, options [nop,nop,sack 1 {2211:2212}], length 0
20:23:41.472966 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [P.], seq 4357:5540, ack 1789, win 501, length 1183
20:23:41.590192 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [FP.], seq 1989:2243, ack 1, win 8212, length 254
20:23:41.632968 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [.], ack 2244, win 501, length 0
20:23:41.681069 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [.], seq 1:1453, ack 1789, win 501, length 1452
20:23:41.782024 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [FP.], seq 1757:1831, ack 5539, win 8212, length 74
20:23:41.824971 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [.], ack 1832, win 501, length 0
20:23:41.921755 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [FP.], seq 1989:2211, ack 1, win 8212, length 222
20:23:41.965077 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [.], ack 2212, win 501, length 0
20:23:42.049360 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [P.], seq 1:1464, ack 2244, win 501, length 1463
20:23:42.049426 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57427: Flags [F.], seq 1464, ack 2244, win 501, length 0
20:23:42.049727 IP 10.0.0.231.57427 > nuq04s45-in-f4.1e100.net.https: Flags [R.], seq 2244, ack 1453, win 0, length 0
20:23:42.108997 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [.], seq 1:1453, ack 1789, win 501, length 1452
20:23:42.109470 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 5540, win 8212, options [nop,nop,sack 1 {1:1453}], length 0
20:23:42.338307 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [P.], seq 5539:6497, ack 1832, win 501, length 958
20:23:42.338354 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57430: Flags [F.], seq 6497, ack 1832, win 501, length 0
20:23:42.338669 IP 10.0.0.231.57430 > nuq04s45-in-f4.1e100.net.https: Flags [R.], seq 1832, ack 6497, win 0, length 0
20:23:42.489371 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [P.], seq 1:1464, ack 2212, win 501, length 1463
20:23:42.489430 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57428: Flags [F.], seq 1464, ack 2212, win 501, length 0
20:23:42.489713 IP 10.0.0.231.57428 > nuq04s45-in-f4.1e100.net.https: Flags [R.], seq 2212, ack 1453, win 0, length 0
20:23:44.771840 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [P.], seq 1789:2325, ack 5540, win 8212, length 536
20:23:44.771862 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [.], ack 2325, win 501, length 0
20:23:45.033994 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [P.], seq 5540:6529, ack 2325, win 501, length 989
20:23:45.241072 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [P.], seq 5540:6529, ack 2325, win 501, length 989
20:23:46.077036 IP nuq04s45-in-f4.1e100.net.https > 10.0.0.231.57429: Flags [P.], seq 5540:6529, ack 2325, win 501, length 989
20:23:46.077531 IP 10.0.0.231.57429 > nuq04s45-in-f4.1e100.net.https: Flags [.], ack 6529, win 8212, options [nop,nop,sack 1 {5540:6529}], length 0
20:23:46.461075 IP 52.113.194.132.https > 10.0.0.231.57414: Flags [.], ack 1, win 501, length 0
20:23:46.973010 IP 20.198.162.78.https > 10.0.0.231.57415: Flags [.], ack 1, win 501, length 0
20:23:46.973472 IP 10.0.0.231.57415 > 20.198.162.78.https: Flags [.], ack 1, win 8209, length 0

@jchen290 肉眼没发现重传,你延长 sniff timeout 没用的话说明不是我们之前推测的原因(握手之后的第一个payload被丢包了)

@jchen290 肉眼没发现重传,你延长 sniff timeout 没用的话说明不是我们之前推测的原因(握手之后的第一个payload被丢包了)

@jschwinger233 谢谢,还有啥其他排查思路吗?群里面看了下好像就我有这个问题。

顺便说一下尝试过用“dae trace -4 -p tcp --port 443 -o trace.log”命令抓包但是报下面这个错误

failed to load BPF: field KprobeSkb1: program kprobe_skb_1: map events: map create: cannot allocate memory (without BTF k/v)