Frequent "failed to exchange" errors

Question

Frequent "failed to exchange" errors

timkgh opened this issue 5 months ago · comments

dnsproxy[1303399]: 2024/05/06 00:40:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exchange

dnsproxy v0.71.1

Requests also take a long time to time out when this happens, around 6s.

It happens for both DoH (h3://) and DoT (tls://) upstreams, on Linux both amd64 and arm64. Observed for both Quad9 and Cloudflare upstreams.

Andrey Meshkov · Answer 1 · Tue May 07 2024 01:03:00 GMT+0800 (China Standard Time)

We'll need more information about this, could you please record a full log?

timkgh · Answer 2 · Tue May 07 2024 08:02:26 GMT+0800 (China Standard Time)

Does this help?

2024/05/05 22:01:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exc
hange ;play.google.com.        IN         A in 26.008209ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAABAAA...": use of closed network connection

2024/05/06 00:40:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exc
hange ;addons-pa.clients6.google.com.        IN         A in 5.979076ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAAB...": H3_NO_ERROR

2024/05/06 11:10:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exchange ;waa-pa.clients6.google.com.        IN         A in 31.069075ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAABAAA...": use of closed network connection

2024/05/06 13:05:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exchange ;addons-pa.clients6.google.com.        IN         A in 26.284076ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAABAAA...": use of closed network connection

Andrey Meshkov · Answer 3 · Thu May 09 2024 03:33:48 GMT+0800 (China Standard Time)

Not really:(

We'll need to see the whole lifetime of a connection to figure out what's going on with them.

Could you please share the log in private via devteam@adguard.com?

timkgh · Answer 4 · Thu May 09 2024 05:09:41 GMT+0800 (China Standard Time)

H3_NO_ERROR looks strange, doesn't it?

Can this not be reproduced/debugged on your side by using --all-servers with h3:// and tls:// for both 9.9.9.11 and 1.1.1.2? I'm sure you'll get to see these errors after a while.

Andrey Meshkov · Answer 5 · Fri May 10 2024 06:11:43 GMT+0800 (China Standard Time)

All those related to H3/Quic may be caused by different quic-go library issues, I am tbh more interested in what you had with DoT.

Errors in the log may be reproduced, at some point the server closes the connection and we handle this gracefully, just repeating the request. I don't yet understand where the 6 seconds timeout is getting from. If it happens with DoT than this is very strange. If it happens with H3/DoQ then it's understandable since quic-go does not handle GOAWAY frames from the server properly at the moment.

timkgh · Answer 6 · Fri May 10 2024 06:27:49 GMT+0800 (China Standard Time)

The timeouts (or errors that take a long time) happen with TLS too:

2024/05/09 10:38:24 [error] dnsproxy: upstream tls://149.112.112.11:853 failed to exchange ;fireoscaptiveportal.com.        IN         A in 6.112033ms: reading response from tls://149.112.112.11:853: EOF

2024/05/09 14:01:15 [error] dnsproxy: upstream tls://[2620:fe::fe:11]:853 failed to exchange ;vscode-sync.trafficmanager.net.        IN         A in 5.283909ms: reading response from tls://[2620:fe::fe:11]:853: EOF

Andrey Meshkov · Answer 7 · Fri May 17 2024 07:25:04 GMT+0800 (China Standard Time)

I inspected the DoT upstream code and tbh I believe this is some network issue and not a bug in dnsproxy.

DoT retries establishing a new connection when there's any error with the cached one but according to your log even though the new connection was established it's getting closed right away. I don't see how dnsproxy itself could do that in the code.

Jörgen · Answer 8 · Fri May 17 2024 14:00:06 GMT+0800 (China Standard Time)

The timeouts (or errors that take a long time) happen with TLS too:

I was suddenly seeing timeouts too with DoT using v0.71.1, so removed all DoT from upstream-servers last week and only using Quic/H3 now without any issues. Strange...

Andrey Meshkov · Answer 9 · Fri May 17 2024 17:02:30 GMT+0800 (China Standard Time)

@iJorgen the same DNS upstream?

Jörgen · Answer 10 · Fri May 17 2024 18:28:11 GMT+0800 (China Standard Time)

@iJorgen the same DNS upstream?

Sorry @ameshkov, only keeping logs for a few days and not using DoT upstreams at the moment.
Smells like a new release of DNSProxy soon, so might try DoT again. 😃

timkgh · Answer 11 · Fri May 17 2024 23:54:49 GMT+0800 (China Standard Time)

I inspected the DoT upstream code and tbh I believe this is some network issue and not a bug in dnsproxy.

DoT retries establishing a new connection when there's any error with the cached one but according to your log even though the new connection was established it's getting closed right away. I don't see how dnsproxy itself could do that in the code.

I observe the same issues on 2 different cloud provider networks, both IPv4 and IPv6, 2 different architectures (Linux amd64 and arm64), 2 different upstreams (Quad9, Cloudflare) and for both DoT and DoH. Maybe there are no actual issues and it's just log noise.

paschun · Answer 12 · Sat Jun 01 2024 15:58:17 GMT+0800 (China Standard Time)

I'm also seeing this with multiple different udp upstreams, v0.71.2 . All requests fail

2024/06/01 00:44:23 [error] dnsproxy: x.x.x.x:53: response received over udp: "exchanging with x.x.x.x:53 over udp: read udp y.y.y.y:55286->x.x.x.x:53: i/o timeout"
2024/06/01 00:44:23 [error] dnsproxy: x.x.x.x:53: response received over udp: "exchanging with x.x.x.x:53 over udp: read udp y.y.y.y:51487->x.x.x.x:53: i/o timeout"
2024/06/01 00:44:23 [error] dnsproxy: x.x.x.x:53: response received over udp: "exchanging with x.x.x.x:53 over udp: read udp y.y.y.y:51189->x.x.x.x:53: i/o timeout"
2024/06/01 00:44:23 [error] dnsproxy: upstream x.x.x.x:53 failed to exchange ;example.com.	IN	 A in 4.000258183s: exchanging with x.x.x.x:53 over udp: read udp y.y.y.y:51189->x.x.x.x:53: i/o timeout
2024/06/01 00:44:23 [error] dnsproxy: upstream x.x.x.x:53 failed to exchange ;example.com.	IN	 AAAA in 4.000582435s: exchanging with x.x.x.x:53 over udp: read udp y.y.y.y:51487->x.x.x.x:53: i/o timeout
2024/06/01 00:44:23 [error] dnsproxy: upstream x.x.x.x:53 failed to exchange ;example.com.	IN	 HTTPS in 4.000550868s: exchanging with x.x.x.x:53 over udp: read udp y.y.y.y:55286->x.x.x.x:53: i/o timeout

Edit: nevermind, it was my firewall issue