google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gVisor writes ethernet frames to tun devices

benbuzbee opened this issue · comments

Description

If we create a tun device inside the netns that gvisor will use, it will write ethernet frames to it when it should write IP packets

https://www.kernel.org/doc/Documentation/networking/tuntap.txt

Depending on the type of device chosen the userspace program has to read/write
  IP packets (with tun) or ethernet frames (with tap)

tun devices are more common, for example openvpn --mktun or wireguard-go. openvpn can support a tap device but it is not the default. Wireguard-go on the other hand cannot.

In our particular use case, we would like to create a tun device inside the sandbox that will ship frames via wireguard and an ethernet device that does not live in the network namespace, for added egress security.

In a related problem, gVisor ignores the pointtopoint and noarp flags on the interface and requires a gateway & ARP. Ideally gVisor notes the pointtopoint device and doesn't try to ARP, and also allows us to specify a default route without a gateway in this case.

Steps to reproduce

These steps are bit annoying because you need something that reads the tun data so I took https://web.ecs.syr.edu/~wedu/seed/Labs/VPN/files/simpletun.c and modified its server mode to just drop the packets for demonstration. Modified code attached and compiled via gcc simpletun.txt -o simpletun
simpletun.txt

Create rootfs with network tools

ID=$(docker run -d --rm --entrypoint /usr/bin/tail praqma/network-multitool -f /dev/null)
docker export $ID> root.tar
docker stop $ID
mkdir root
tar -C root -xf root.tar

Create test container and configure netns with tun device

Simple config.json.txt

You will need to compile the attached code for simpletun to read from the tun device and drop packets so the requests don't hang

sudo runsc create test
sudo rm -f /var/run/netns/test && sudo ln -s /proc/$(sudo runsc state test | jq -r '.pid')/ns/net /var/run/netns/test
sudo ip netns exec test ./simpletun -i tun0 -s -d
sudo ip netns exec test ip link set lo up
sudo ip netns exec test ip link set tun0 up
sudo ip netns exec test ip addr add 10.0.2.2/24 dev tun0
sudo ip netns exec test ip route add default via 10.0.2.1 dev tun0

Start it and check settings

$ sudo ip netns exec test ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: tun0: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 100
    link/none 
    inet 10.0.2.2/24 scope global tun0
       valid_lft forever preferred_lft forever
    
$ sudo runsc start test

$ sudo runsc exec test /sbin/ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope global dynamic 
    inet6 ::1/128 scope global dynamic 
2: tun0: <UP,LOWER_UP> mtu 1500 
    link/ether 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 10.0.2.2/24 scope global dynamic 
    inet6 fe80::98e5:3224:586a:4f9d/64 scope global dynamic 

Ok so now we have gvisor running with the tun device. note it has dropped the pointtopoint and multicast flags and given it mac address 00:00:00:00:00:00

Verify gVisor is writing ethernet frames

In one terminal:
$ sudo ip netns exec test tcpdump -vv -l -x -i tun0

In another:
$ sudo runsc exec test /usr/bin/curl -vv 10.0.2.10

Note:

tcpdump: listening on tun0, link-type RAW (Raw IP), capture size 262144 bytes
19:27:47.814582 unknown ip 15
        0x0000:  ffff ffff ffff 0000 0000 0000 0806 0001
        0x0010:  0800 0604 0001 0000 0000 0000 0a00 0202
        0x0020:  0000 0000 0000 0a00 0264

Ethernet ARP. tcpdump is also confused because it knows the device is a tun and doens't expet this framing.

You can run simpletun with -a

If you run simpletun with the -a flag it will create a tap device and you can see tcpdump recognizes the ARP

tcpdump: listening on tun0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:29:37.704942 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.2.10 tell 10.0.2.2, length 28
        0x0000:  0001 0800 0604 0001 86d9 dba2 f42b 0a00
        0x0010:  0202 0000 0000 0000 0a00 020a

runsc version

runsc version release-20210315.0
spec: 1.0.2


### docker version (if using docker)

_No response_

### uname

Linux browser-devbox 5.8.0-23-generic #24~20.04.1-Ubuntu SMP Sat Oct 10 04:57:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

### kubectl (if using Kubernetes)

_No response_

### repo state (if built from source)

_No response_

### runsc debug logs (if available)

_No response_

Thanks for the report. I believe I understand where its going wrong. runsc today treats all underlying FD's as ethernet devices.

Here's where we scrape the network interfaces in the namespace

link := boot.FDBasedLink{
and then we pass this via RPC to the sentry when it boots here
EthernetHeader: true,
.

As you can see its currently hardcoded to treat all underlying FD's as ethernet. We could trivially change that to say

EthernetHeader: mac != "",

That should I believe fix most of the issues.

Could you try that and if it works then I can roll up a PR to fix this.

Seems on a good track. I will have to integrate more stuff to really verify its working correctly e2e but I can see it now sends a TCP SYN

18:18:08.888624 IP (tos 0x0, ttl 64, id 21798, offset 0, flags [none], proto TCP (6), length 60)
    10.0.2.2.46653 > 10.0.2.10.http: Flags [S], cksum 0xfd48 (correct), seq 1809025410, win 29184, options [mss 1460,sackOK,TS val 2298317770 ecr 0,nop,wscale 7], length 0

I will try to spend more time and verify e2e