gvisor: panic: Incrementing non-positive count 0xc000533180 on stack.PacketBuffer
jonseymour opened this issue · comments
Description
A change dating from November 11 has introduced reference counting for UDP packets that appears to have caused a panic in tailscale, a consumer of this library - see gvisor: panic: Incrementing non-positive count 0xc000533180 on stack.PacketBuffer
The commits at issue are possibly one or more of these commits:
The evidence that this is so is that the tailscale code base does not include these commits until the v1.22.0 release. Earlier versions used a fork of the code base but tailscale reverted to using the mainline code at this commit tailscale/tailscale@1af2622 after which time the issue appeared.
It is clear from the stacktraces that the issues relate to errors in reference counting of UDP packets, which the commits above did introduce.
AFAICT, a reliable deterministic test case for this does not exist, but it does seem reasonably clear that there is a problem with the reference counting of UDP packets, so I have raised an issue here in order to at least document the issue
panic: Incrementing non-positive count 0xc0000ea380 on stack.PacketBuffer
goroutine 372 [running]:
gvisor.dev/gvisor/pkg/tcpip/stack.(*packetBufferRefs).IncRef(0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/packet_buffer_refs.go:80 +0x105
gvisor.dev/gvisor/pkg/tcpip/transport/udp.(*endpoint).HandlePacket(0xc0001c4c00, {0x998, {0xc00055a7d0, 0x4}, 0xe8b5, {0xc00055a7cc, 0x4}}, 0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/transport/udp/endpoint.go:924 +0x2ed
gvisor.dev/gvisor/pkg/tcpip/stack.(*endpointsByNIC).handlePacket(0xc0006331d0, {0x998, {0xc00055a7d0, 0x4}, 0xe8b5, {0xc00055a7cc, 0x4}}, 0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/transport_demuxer.go:185 +0x27f
gvisor.dev/gvisor/pkg/tcpip/stack.(*transportDemuxer).deliverPacket(0xc0004601c8, 0x11, 0xc0000ea380, {0x998, {0xc00055a7d0, 0x4}, 0xe8b5, {0xc00055a7cc, 0x4}})
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/transport_demuxer.go:595 +0x3c5
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).DeliverTransportPacket(0xc00000a1e0, 0x11, 0xc0000ea380)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/nic.go:861 +0x1fe
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).handleValidatedPacket(0xc0002a3800, {0xc00028c160, 0x14, 0x20}, 0xc0000ea700, {0x0, 0x0})
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/network/ipv4/ipv4.go:998 +0xfd6
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).HandlePacket(0xc0002a3800, 0x886375)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/network/ipv4/ipv4.go:784 +0x2b8
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).DeliverNetworkPacket(0xc00000a1e0, {0xc0007efbf0, 0x1}, {0x0, 0x0}, 0x800, 0xc0000ea700)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/stack/nic.go:777 +0x2dd
gvisor.dev/gvisor/pkg/tcpip/link/channel.(*Endpoint).InjectLinkAddr(...)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/link/channel/channel.go:196
gvisor.dev/gvisor/pkg/tcpip/link/channel.(*Endpoint).InjectInbound(...)
gvisor.dev/gvisor@v0.0.0-20220126021142-d8aa030b2591/pkg/tcpip/link/channel/channel.go:191
tailscale.com/wgengine/netstack.(*Impl).injectInbound(0xc0001e0000, 0xc0000a0080, 0xc0000aad80)
tailscale.com/wgengine/netstack/netstack.go:577 +0x744
tailscale.com/net/tstun.(*Wrapper).filterIn(0xc000214000, {0xc0000b8010, 0x1b, 0xffef})
tailscale.com/net/tstun/wrap.go:614 +0x78d
tailscale.com/net/tstun.(*Wrapper).Write(0xc000214000, {0xc0000b8000, 0xc0002e2a00, 0x1}, 0x10)
tailscale.com/net/tstun/wrap.go:627 +0x7c
golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialReceiver(0xc000595c00)
golang.zx2c4.com/wireguard@v0.0.0-20211116201604-de7c702ace45/device/receive.go:477 +0x4d1
created by golang.zx2c4.com/wireguard/device.(*Peer).Start
golang.zx2c4.com/wireguard@v0.0.0-20211116201604-de7c702ace45/device/peer.go:199 +0x295
Steps to reproduce
No deterministic procedure exists, but attempting to use tailscale v1.22.0+ in presence of heavy UDP traffic appears to be one way to reproduce the problem.
runsc version
n/a
docker version (if using docker)
# not using docker
AWS ECS Fargate
uname
No response
kubectl (if using Kubernetes)
n/a
repo state (if built from source)
release-20220314.0-4228-g536b85ae1
runsc debug logs (if available)
Not available.
Thanks for reporting this issue. I suspect that it's related to a fragmentation issue that was fixed (6a28dc7) on the same day that that tailscale/tailscale@1af2622 was created. The fact that the bug seems to happen mostly under heavy traffic supports this. It's possible that you just missed pulling in that fix. Can you check which version of gVisor was pulled in?