google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tcpip/link/fdbased: socket operation on non-socket

xjasonlyu opened this issue · comments

Description

Since commit e511fc9 removed the WritePacket method from tcpip/link/fdbased package, this error may occur under some certain circumstances.

e.g. When we use tcpip/link/tun to open a TUN device:

fd, err := tun.Open(t.name)
if err != nil {
	return nil, fmt.Errorf("create tun: %w", err)
}

Then pass the TUN fd as a fdbased.Option and add the endpoint to stack:

ep, err := fdbased.New(&fdbased.Options{
	FDs: []int{fd},
	...
})

The WritePackets method would be called to write back packets:

func (e *endpoint) WritePackets(_ stack.RouteInfo, pkts stack.PacketBufferList, _ tcpip.NetworkProtocolNumber) (int, tcpip.Error) {

Then the sendBatch:

n, err := e.sendBatch(batchFD, batch)

However, in the the sendBatch, the else would be hit and call the rawfile.NonBlockingSendMMsg(batchFD, mmsgHdrs):

if len(mmsgHdrs) == 0 {
// We can't fit batch[0] into a mmsghdr while staying under
// e.maxSyscallHeaderBytes. Use WritePacket, which will avoid the
// mmsghdr (by using writev) and re-buffer iovecs more aggressively
// if necessary (by using e.writevMaxIovs instead of
// rawfile.MaxIovs).
pkt := batch[0]
if err := e.writePacket(pkt.EgressRoute, pkt.NetworkProtocolNumber, pkt); err != nil {
return packets, err
}
packets++
} else {
for len(mmsgHdrs) > 0 {
sent, err := rawfile.NonBlockingSendMMsg(batchFD, mmsgHdrs)
if err != nil {
return packets, err
}
packets += sent
mmsgHdrs = mmsgHdrs[sent:]
}
}

So the issue happens here, the SENDMMSG syscall is used on a non-socket fd (TUN fd instead) and cause the socket operation on non-socket error:

func NonBlockingSendMMsg(fd int, msgHdrs []MMsgHdr) (int, tcpip.Error) {
n, _, e := unix.RawSyscall6(unix.SYS_SENDMMSG, uintptr(fd), uintptr(unsafe.Pointer(&msgHdrs[0])), uintptr(len(msgHdrs)), unix.MSG_DONTWAIT, 0, 0)
if e != 0 {
return 0, TranslateErrno(e)
}
return int(n), nil
}

Steps to reproduce

Mentioned above.

runsc version

null

docker version (if using docker)

null

uname

null

kubectl (if using Kubernetes)

null

repo state (if built from source)

null

runsc debug logs (if available)

null

Thanks for the detailed report. I will take a look.

We just need to store that the underlying FD is not a socket and degrade WritePackets to writePacket always.

We do the socket check fdbased.New() but its not plumbed through to the write path. We should do that.