google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

netstack: performance w/TCP-RACK on Windows

jwhited opened this issue · comments

Description

Our usage of netstack within tailscale performs poorly on Windows with the following stack settings:

  • default congestion control (reno)
  • tcpip.TCKSACKEnabled(true)
  • default TCP loss recovery (tcpip.TCPRACKLossDetection)

Using Stack.AddTCProbe() to print congestion window (in packets) shows the window being held below 10 packets during a throughput test:

	var lastDebug time.Time
	ipstack.AddTCPProbe(func(s *stack.TCPEndpointState) {
		now := time.Now()
		if now.After(lastDebug.Add(time.Second)) {
			logf("%s:%d => %s:%d cwnd in packets: %d", s.ID.LocalAddress.String(), s.ID.LocalPort, s.ID.RemoteAddress.String(), s.ID.RemotePort, s.Sender.SndCwnd)
			lastDebug = now
		}
	})
2023/11/29 18:58:24 100.78.224.154:80 => 100.90.1.8:64349 cwnd in packets: 7
2023/11/29 18:58:24 100.78.224.154:80 => 100.90.1.8:64348 cwnd in packets: 9
2023/11/29 18:58:24 100.78.224.154:80 => 100.90.1.8:64347 cwnd in packets: 5
2023/11/29 18:58:25 100.78.224.154:80 => 100.90.1.8:64349 cwnd in packets: 5
2023/11/29 18:58:26 100.78.224.154:80 => 100.90.1.8:64351 cwnd in packets: 8
2023/11/29 18:58:26 100.78.224.154:80 => 100.90.1.8:64349 cwnd in packets: 7
2023/11/29 18:58:27 100.78.224.154:80 => 100.90.1.8:64350 cwnd in packets: 9

Throughput is poor (8Mb/s). Changing TCP loss recovery to 0 (no TCP-RACK) results in significantly improved throughput by a factor of ~10 (8Mb/s => 80Mb/s). Congestion window moves in a more expected fashion. Path under test is not particularly lossy.

Linux does not exhibit the same behavior/issue. This appears to be Windows-specific. Reproduced by multiple users in multiple environments across Windows 11 and Windows Server 2022.

Originally reported via tailscale/tailscale#9707

Steps to reproduce

tailscale/tailscale#9707 (comment) describes steps to reproduce using tailscale. We have since changed loss recovery on Windows as a workaround via tailscale/tailscale@5e861c3.

Reproduced at both gVisor HEAD (4b4191b) and what tailscale is currently using (4fe3006)

runsc version

No response

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

Adding @nybidari, who knows more about RACK.

Interesting that this is Windows-only, as I wouldn't expect that to matter. Maybe something to do with timers (since RACK is time-based) is OS-dependent?

Adding @nybidari, who knows more about RACK.

Interesting that this is Windows-only, as I wouldn't expect that to matter. Maybe something to do with timers (since RACK is time-based) is OS-dependent?

FWIW I have tested with higher resolution timing, but found no difference in the results:

err := windows.TimeBeginPeriod(1)
if err != nil {
	panic(err)
}

I just now realized that tcpip.TCPRACKStaticReoWnd and tcpip.TCPRACKNoDupTh are meant to mask on top of tcpip.TCPRACK, and they are unused anyway. So when I was using those values it was the same as no RACK. Removed that bit from the description.

I don't think RACK does anything different on windows compared to other operating systems.
From my understanding, RACK performance can be lower than other congestion control algorithms in these cases:

  1. Packets were reordered and RACK adjusts the reordering window. Lets say RACK detected a large reordering window.
    Now if the packets (after RACK adjusted the reordering window to a large value) were actually lost, then RACK waits till the reordering window timeout to detect the packet loss. To adjust the reorder window back to the initial value, RACK will wait for 16 loss recoveries. Other congestion control algorithms do not consider reordering at all and in this case they will enter only one loss recovery falsely.
  2. RTOs: I don't know how, but may be there are more RTOs with RACK on windows.

These are just my speculations, the root cause can be something else also!
To debug further, would it be possible to get these TCP stats for with and without RACK on windows:
https://github.com/google/gvisor/blob/master/pkg/tcpip/tcpip.go#L2123-L2146 ?

To debug further, would it be possible to get these TCP stats for with and without RACK on windows: https://github.com/google/gvisor/blob/master/pkg/tcpip/tcpip.go#L2123-L2146 ?

30 second throughput test

Windows Server 2022 No TCP-RACK ~80Mb/s:

2023/11/30 00:57:00 Retransmits: 3299 FastRecovery: 0 SACKRecovery: 52 TLPRecovery: 0 SlowStartRetransmits: 1653 FastRetransmit: 52 Timeouts: 10

Windows Server 2022 TCP-RACK ~8Mb/s:

2023/11/30 00:59:40 Retransmits: 1430 FastRecovery: 0 SACKRecovery: 690 TLPRecovery: 0 SlowStartRetransmits: 4 FastRetransmit: 687 Timeouts: 4

Ubuntu 22.04 No TCP-RACK ~90Mb/s:

2023/11/30 01:05:31 Retransmits: 4251 FastRecovery: 0 SACKRecovery: 66 TLPRecovery: 0 SlowStartRetransmits: 2690 FastRetransmit: 66 Timeouts: 15

Ubuntu 22.04 TCP-RACK ~80Mb/s:

2023/11/30 01:03:07 Retransmits: 2220 FastRecovery: 0 SACKRecovery: 64 TLPRecovery: 0 SlowStartRetransmits: 3 FastRetransmit: 64 Timeouts: 1

A friendly reminder that this issue had no activity for 120 days.

@nybidari any findings? Any reason to believe a more recent release would improve RACK on windows?