google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Netfilter config is not handled on runsc boot, reopen of: "DNS not working in Docker Compose"

pkit opened this issue · comments

Description

See #115
Which was closed pretty prematurely.

The problem is that docker-compose uses a special netfilter config that remaps 127.0.0.11 in a pretty funny way.
I.e.:

Chain DOCKER_OUTPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            127.0.0.11           tcp dpt:53 to:127.0.0.11:40107
    3   232 DNAT       udp  --  *      *       0.0.0.0/0            127.0.0.11           udp dpt:53 to:127.0.0.11:33195

Chain DOCKER_POSTROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 SNAT       tcp  --  *      *       127.0.0.11           0.0.0.0/0            tcp spt:40107 to::53
    0     0 SNAT       udp  --  *      *       127.0.0.11           0.0.0.0/0            udp spt:33195 to::53

That's why udp to 127.0.0.11:53 doesn't work.
Which brings us to the bug itself: if gvisor already copies lo routes and arp config, why shouldn't it do the same for netfilter stuff?
I think it should, otherwise these bugs will crop up.
Will try to assemble a PR on that soon.

Steps to reproduce

See #115

runsc version

548d12773965831811b2fe719df0c8c0da4e8a61

docker version (if using docker)

Client: Docker Engine - Community
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:48:02 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:45:53 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

uname

Linux machine1 5.17.2-051702-generic #202204111357-Ubuntu SMP PREEMPT Mon Apr 11 14:08:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

And yes, it works if correct port is used:

$ sudo nsenter -n -t $(docker inspect --format {{.State.Pid}} gvisor_ubuntu-gvisor_1) iptables-legacy -nvL -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 5 packets, 317 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DOCKER_OUTPUT  all  --  *      *       0.0.0.0/0            127.0.0.11          

Chain POSTROUTING (policy ACCEPT 5 packets, 317 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DOCKER_POSTROUTING  all  --  *      *       0.0.0.0/0            127.0.0.11          

Chain DOCKER_OUTPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            127.0.0.11           tcp dpt:53 to:127.0.0.11:40637
    0     0 DNAT       udp  --  *      *       0.0.0.0/0            127.0.0.11           udp dpt:53 to:127.0.0.11:49200

Chain DOCKER_POSTROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 SNAT       tcp  --  *      *       127.0.0.11           0.0.0.0/0            tcp spt:40637 to::53
    0     0 SNAT       udp  --  *      *       127.0.0.11           0.0.0.0/0            udp spt:49200 to::53

From above UDP is remapped: 127.0.0.11:53->127.0.0.11:49200

$ sudo nsenter -n -t $(docker inspect --format {{.State.Pid}} gvisor_ubuntu-gvisor_1) dig google.com -p 49200 @127.0.0.11

; <<>> DiG 9.16.1-Ubuntu <<>> google.com -p 49200 @127.0.0.11
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3925
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		118	IN	A	172.217.20.206

;; Query time: 8 msec
;; SERVER: 127.0.0.11#49200(127.0.0.11)
;; WHEN: Thu Apr 28 19:51:16 CEST 2022
;; MSG SIZE  rcvd: 55

@pkit I am not sure from your description as to what is not working with gVisor. Can you provide a concise explanation of what you expect to work and what is not working?

I am not averse to picking up netfilter settings at startup but our iptables implementation is not as complete as linux and picking up netfilter rules entails a bit of risk.

@kevinGC should have a better answer for you.

@hbhasker
DNS access to 127.0.0.11 is not working over port 53, because it's expected to be redirected to other ports by iptables rules and there are none copied over.

It also means that there is a little bit of security risk in current implementation, for example in case where startup rules deny some access, but gvisor rewrites it all to accept unconditionally...

I have started implementation anyway. Will update when we get to some results.

FYI, the place in moby/docker where that resolver is set up: here

@pkit Thanks for the clarification, looking at the rules I think those will work with gvisor. Please add @kevinGC @ghanan94 as reviewers once you have a PR ready.

btw I was thinking about this and not sure copying in the rules will work. Since if docker is redirecting it a loopback address like 127.0.0.11 that rule will direct it to the sentry's internal loopback. But dockerd does not listen on the sentry's internal loopback but the host's container namespaces loopback.

@hbhasker
As I've tested above in givisor doing dig google.com -p 49200 @127.0.0.11 works as expected. The only problem is port 53 to 49200 (or whatever) remapping.

You tested by entering the host container namespace that dig is not running inside gvisor. For that you will have to use docker exec to enter gvisor and run the command.

Yup, did that too IIRC. But will re-check.

@hbhasker yup, you were right, it doesn't route to 127.0.0.11 from gvisor.
Will think about it.

@hbhasker still not having iptables set up during boot is a security problem. Traffic may be restricted in vanilla docker/containerd but will be unrestricted under gvisor.

In general we do not trust the sentry so any enforcement in the sentry isn't safe from potential compromise.

E.g a privilege escalation bug in gvisor could allow a user to delete any such network enforcement via iptable.

As such we expect that all such enforcement should happen outside the container on the host.

As for this specific problem maybe the right solution for gvisor would be to generate a resolv.conf pointing to the proper IP/ port for DNS.

That's an internal DNS. It doesn't have non 127.0.0.0/8 address
It looks like it's really a docker problem as for example https://github.com/containerd/nerdctl has no problem routing correctly, because it uses dedicated routed host dns service.
But it's kind of bad for testing as a lot of test tools still use docker.
Will update if I found something interesting.

For others running into this, I was able to work around this issue by setting network_mode: bridge in my compose.json. It seems the issue only happens with Compose managed bridges. My resolv.conf in this configuration shows the default nameserver for my network.

A friendly reminder that this issue had no activity for 120 days.

This issue has been closed due to lack of activity.

Please repoen because this issue is still present and a problem when using gVisor with docker compose.

This is a tough problem to solve. At a high level: putting an application inside gVisor sandboxes it, but Compose puts the DNS server outside the sandbox. I don't know of an especially simple solution.

As Ian mentioned in the other bug, the best way to deal with this is to change the DNS server in use. If you need the Compose DNS server specifically, you could modify iptables rules outside the sandbox to redirect DNS traffic to Compose DNS.

FWIW we now support scraping netfilter rules from the host (see b119cc3 and e7e8d0f) that can be enabled via the --reproduce-nftables flag. But this still doesn't solve the problem: DNS packets are now redirected via the sandbox loopback, but Docker's DNS server is listening on the host loopback (as Bhasker mentioned above).

I'd love a better solution than "add gVisor-specific configuration", but in this case I'm not sure what that would be.

Is there, maybe, a generic way to add an nftables nat/redirect rule to capture the DNS lookups from within the sandbox to the host loopback DNS server? I am thinking of a (pseudo) static IP address specified as DNS server for the containers in the gVisor sandbox (e.g. via docker-compose) that can be redirected generically to the compose-managed host DNS resolver. That might potentially be generic across all containers?

I'm not sure of a generic way to do it, but it could certainly be done. Note that I'm not familiar with Docker Compose, so I'm not sure how difficult this is.

  1. Add an iptables rule inside your containers that DNATs requests. For example, 127.0.0.11:53 (or *:53 if you want all DNS traffic) might become 10.1.2.3:53.
  2. Add an iptables rule outside your containers that DNATs 10.1.2.3:53 to the actual DNS server 127.0.0.11:40107.