google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gvisor: panic: Incrementing non-positive count 0xc000533180 on stack.PacketBuffer

jonseymour opened this issue · comments

Description

A change dating from November 11 has introduced reference counting for UDP packets that appears to have caused a panic in tailscale, a consumer of this library - see gvisor: panic: Incrementing non-positive count 0xc000533180 on stack.PacketBuffer

The commits at issue are possibly one or more of these commits:

84b38f4
2758e11

The evidence that this is so is that the tailscale code base does not include these commits until the v1.22.0 release. Earlier versions used a fork of the code base but tailscale reverted to using the mainline code at this commit tailscale/tailscale@1af2622 after which time the issue appeared.

It is clear from the stacktraces that the issues relate to errors in reference counting of UDP packets, which the commits above did introduce.

AFAICT, a reliable deterministic test case for this does not exist, but it does seem reasonably clear that there is a problem with the reference counting of UDP packets, so I have raised an issue here in order to at least document the issue

panic: Incrementing non-positive count 0xc0000ea380 on stack.PacketBuffer
goroutine 372 [running]:
gvisor.dev/gvisor/pkg/tcpip/stack.(*packetBufferRefs).IncRef(0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/packet_buffer_refs.go:80 +0x105
gvisor.dev/gvisor/pkg/tcpip/transport/udp.(*endpoint).HandlePacket(0xc0001c4c00, {0x998, {0xc00055a7d0, 0x4}, 0xe8b5, {0xc00055a7cc, 0x4}}, 0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/transport/udp/endpoint.go:924 +0x2ed
gvisor.dev/gvisor/pkg/tcpip/stack.(*endpointsByNIC).handlePacket(0xc0006331d0, {0x998, {0xc00055a7d0, 0x4}, 0xe8b5, {0xc00055a7cc, 0x4}}, 0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/transport_demuxer.go:185 +0x27f
gvisor.dev/gvisor/pkg/tcpip/stack.(*transportDemuxer).deliverPacket(0xc0004601c8, 0x11, 0xc0000ea380, {0x998, {0xc00055a7d0, 0x4}, 0xe8b5, {0xc00055a7cc, 0x4}})
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/transport_demuxer.go:595 +0x3c5
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).DeliverTransportPacket(0xc00000a1e0, 0x11, 0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/nic.go:861 +0x1fe
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).handleValidatedPacket(0xc0002a3800, {0xc00028c160, 0x14, 0x20}, 0xc0000ea700, {0x0, 0x0})
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/network/ipv4/ipv4.go:998 +0xfd6
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).HandlePacket(0xc0002a3800, 0x886375)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/network/ipv4/ipv4.go:784 +0x2b8
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).DeliverNetworkPacket(0xc00000a1e0, {0xc0007efbf0, 0x1}, {0x0, 0x0}, 0x800, 0xc0000ea700)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/nic.go:777 +0x2dd
gvisor.dev/gvisor/pkg/tcpip/link/channel.(*Endpoint).InjectLinkAddr(...)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/link/channel/channel.go:196
gvisor.dev/gvisor/pkg/tcpip/link/channel.(*Endpoint).InjectInbound(...)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/link/channel/channel.go:191
tailscale.com/wgengine/netstack.(*Impl).injectInbound(0xc0001e0000, 0xc0000a0080, 0xc0000aad80)
tailscale.com/wgengine/netstack/netstack.go:577 +0x744
tailscale.com/net/tstun.(*Wrapper).filterIn(0xc000214000, {0xc0000b8010, 0x1b, 0xffef})
tailscale.com/net/tstun/wrap.go:614 +0x78d
tailscale.com/net/tstun.(*Wrapper).Write(0xc000214000, {0xc0000b8000, 0xc0002e2a00, 0x1}, 0x10)
tailscale.com/net/tstun/wrap.go:627 +0x7c
golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialReceiver(0xc000595c00)
golang.zx2c4.com/wireguard@v0.0.0-20211116201604-de7c702ace45/device/receive.go:477 +0x4d1
created by golang.zx2c4.com/wireguard/device.(*Peer).Start
golang.zx2c4.com/wireguard@v0.0.0-20211116201604-de7c702ace45/device/peer.go:199 +0x295

Steps to reproduce

No deterministic procedure exists, but attempting to use tailscale v1.22.0+ in presence of heavy UDP traffic appears to be one way to reproduce the problem.

runsc version

n/a

docker version (if using docker)

# not using docker
AWS ECS Fargate

uname

No response

kubectl (if using Kubernetes)

n/a

repo state (if built from source)

release-20220314.0-4228-g536b85ae1

runsc debug logs (if available)

Not available.

Thanks for reporting this issue. I suspect that it's related to a fragmentation issue that was fixed (6a28dc7) on the same day that that tailscale/tailscale@1af2622 was created. The fact that the bug seems to happen mostly under heavy traffic supports this. It's possible that you just missed pulling in that fix. Can you check which version of gVisor was pulled in?

Thanks. Tailscale synced once @ d8aa030 and then again @ 536b85a. The latter includes the fix you mentioned. Since that commit has not been released yet, I am going to assume this is the issue and close this issue accordingly.