google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Corrupt data when transferring data using io.Copy in a Golang program

gmwiz opened this issue · comments

Description

To use sandboxed servers that only support accepting connections using a TCP socket, I wrote a tiny TCP-Unix domain socket bridge program in Golang, that will listen on a host Unix domain socket, and connect to a sandboxed application server that listens on a given TCP port, which is essentially the Golang equivalent of socat UNIX-LISTEN:unix.sock TCP-CONNECT:1234. In my tests, I encountered an odd problem that I was able to later reproduce using a smaller program.

My Golang program used the io.Copy function in 2 different Goroutines, one being io.Copy(tcpSock, unixSock) and the other being io.Copy(unixSock, tcpSock). What I noticed is that after transferring all of the data, the data received within the sandbox was corrupt or that the connection was abruptly disconected. To overcome this, I modified my program to use the Reader & Writer interface directly. I noticed that when using io.Copy, this causes the Golang program to call splice internally and copy buffers between pipes & sockets, and when using Read & Write, this does not happen.

See reproducer below with attached runsc debug log.
Any help or guidance will be highly appreciated.

Steps to reproduce

Statically compile the following Golang program using Go 1.21.6:

cat << EOF > xfer.go
package main

import (
	"context"
	"fmt"
	"io"
	"net"
	"os"
	"strconv"
	"sync"
)

func main() {
	if len(os.Args) != 3 {
		panic("Usage: <unix_listen_path> <tcp_connect_port>")
	}

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	unixListenPath := os.Args[1]
	tcpConnectPort, err := strconv.Atoi(os.Args[2])
	if err != nil {
		panic(err)
	}

	if err = os.Remove(unixListenPath); err != nil && !os.IsNotExist(err) {
		panic(err)
	}
	unixListener := net.ListenConfig{}
	unixServer, err := unixListener.Listen(ctx, "unix", unixListenPath)
	if err != nil {
		panic(err)
	}
	defer func() { _ = unixServer.Close() }()

	unixClient, err := unixServer.Accept()
	if err != nil {
		panic(err)
	}
	defer func() { _ = unixClient.Close() }()
	fmt.Println("unix client connected")

	tcpDialer := net.Dialer{}
	tcpClient, err := tcpDialer.DialContext(ctx, "tcp", fmt.Sprintf("127.0.0.1:%d", tcpConnectPort))
	if err != nil {
		panic(err)
	}
	defer func() { _ = tcpClient.Close() }()
	fmt.Println("tcp client connected")

	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		n, err := io.Copy(unixClient, tcpClient)
		fmt.Printf("tcp->unix: transfered %d bytes (err: %v)\n", n, err)
	}()
	wg.Add(1)
	go func() {
		defer wg.Done()
		n, err := io.Copy(tcpClient, unixClient)
		fmt.Printf("unix->tcp: transfered %d bytes (err: %v)\n", n, err)
	}()
	fmt.Println("transfer started")
	wg.Wait()
	fmt.Println("transfer done")
}
EOF

docker run -it --rm -v $(pwd):/src -w /src -e CGO_ENABLED=0 golang:1.21.6-bookworm go build -ldflags "-w -s" -o xfer xfer.go

Install gVisor runtime:

sudo runsc install --runtime runsc-unix-debug -- \
  --host-uds=all \
  --debug \
  --debug-log=/tmp/runsc-debug.log \
  --strace \
  --log-packets

Create a file to transfer host-blob.bin, run sandboxed mock server and try to transfer host-blob.bin inside sandboxed container:

sudo rm -f ./unix.sock ./host-blob.bin /tmp/runsc-debug.log
sudo dd if=/dev/urandom of=host-blob.bin bs=1M count=16
md5sum ./host-blob.bin
docker run --runtime=runsc-unix-debug -v $(pwd):/test --workdir /test --rm -it --entrypoint '' alpine/socat:1.8.0.0 sh -c 'rm -f /tmp/sandbox-blob.bin; socat TCP-LISTEN:1234 - > /tmp/sandbox-blob.bin & ./xfer ./unix.sock 1234 & wait; md5sum ./host-blob.bin /tmp/sandbox-blob.bin; ls -la ./host-blob.bin /tmp/sandbox-blob.bin'

In a different shell run:

sudo socat UNIX-CONNECT:unix.sock - < host-blob.bin

Example output:

16+0 records in
16+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0342477 s, 490 MB/s
52aebb94eea57f9f99394ce57afa5423  ./host-blob.bin
unix client connected
tcp client connected
transfer started
tcp->unix: transfered 0 bytes (err: <nil>)
unix->tcp: transfered 16777216 bytes (err: <nil>)
transfer done
52aebb94eea57f9f99394ce57afa5423  ./host-blob.bin
4f6ef3405ce77457524886896eee3285  /tmp/sandbox-blob.bin
-rw-r--r--    1 root     root      16777216 Jan 27 20:14 ./host-blob.bin
-rw-r--r--    1 root     root      16777216 Jan 27 20:14 /tmp/sandbox-blob.bin

As can be seen, sandbox-blob.bin is 16MiB in size, but the hash is definitely different between host-blob.bin and sandbox-blob.bin.

runsc version

runsc version release-20240122.0
spec: 1.1.0-rc.1

docker version (if using docker)

Client:
 Version:           20.10.25+dfsg1
 API version:       1.41
 Go version:        go1.21.5
 Git commit:        b82b9f3
 Built:             Mon Jan  8 00:09:17 2024
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.25+dfsg1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.21.5
  Git commit:       5df983c
  Built:            Mon Jan  8 00:09:17 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.24~ds1
  GitCommit:        1.6.24~ds1-1
 runc:
  Version:          1.1.10+ds1
  GitCommit:        1.1.10+ds1-1
 docker-init:
  Version:          0.19.0
  GitCommit:

uname

Linux 6.6.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.9-1 (2024-01-01) x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

runsc-debug.log