Netfilter config is not handled on runsc boot, reopen of: "DNS not working in Docker Compose"
pkit opened this issue · comments
Description
See #115
Which was closed pretty prematurely.
The problem is that docker-compose uses a special netfilter config that remaps 127.0.0.11
in a pretty funny way.
I.e.:
Chain DOCKER_OUTPUT (1 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- * * 0.0.0.0/0 127.0.0.11 tcp dpt:53 to:127.0.0.11:40107
3 232 DNAT udp -- * * 0.0.0.0/0 127.0.0.11 udp dpt:53 to:127.0.0.11:33195
Chain DOCKER_POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 SNAT tcp -- * * 127.0.0.11 0.0.0.0/0 tcp spt:40107 to::53
0 0 SNAT udp -- * * 127.0.0.11 0.0.0.0/0 udp spt:33195 to::53
That's why udp to 127.0.0.11:53
doesn't work.
Which brings us to the bug itself: if gvisor already copies lo
routes and arp
config, why shouldn't it do the same for netfilter
stuff?
I think it should, otherwise these bugs will crop up.
Will try to assemble a PR on that soon.
Steps to reproduce
See #115
runsc version
548d12773965831811b2fe719df0c8c0da4e8a61
docker version (if using docker)
Client: Docker Engine - Community
Version: 20.10.14
API version: 1.41
Go version: go1.16.15
Git commit: a224086
Built: Thu Mar 24 01:48:02 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.14
API version: 1.41 (minimum version 1.12)
Go version: go1.16.15
Git commit: 87a90dc
Built: Thu Mar 24 01:45:53 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.5.11
GitCommit: 3df54a852345ae127d1fa3092b95168e4a88e2f8
runc:
Version: 1.0.3
GitCommit: v1.0.3-0-gf46b6ba
docker-init:
Version: 0.19.0
GitCommit: de40ad0
uname
Linux machine1 5.17.2-051702-generic #202204111357-Ubuntu SMP PREEMPT Mon Apr 11 14:08:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
No response
And yes, it works if correct port is used:
$ sudo nsenter -n -t $(docker inspect --format {{.State.Pid}} gvisor_ubuntu-gvisor_1) iptables-legacy -nvL -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 5 packets, 317 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER_OUTPUT all -- * * 0.0.0.0/0 127.0.0.11
Chain POSTROUTING (policy ACCEPT 5 packets, 317 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER_POSTROUTING all -- * * 0.0.0.0/0 127.0.0.11
Chain DOCKER_OUTPUT (1 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- * * 0.0.0.0/0 127.0.0.11 tcp dpt:53 to:127.0.0.11:40637
0 0 DNAT udp -- * * 0.0.0.0/0 127.0.0.11 udp dpt:53 to:127.0.0.11:49200
Chain DOCKER_POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 SNAT tcp -- * * 127.0.0.11 0.0.0.0/0 tcp spt:40637 to::53
0 0 SNAT udp -- * * 127.0.0.11 0.0.0.0/0 udp spt:49200 to::53
From above UDP is remapped: 127.0.0.11:53
->127.0.0.11:49200
$ sudo nsenter -n -t $(docker inspect --format {{.State.Pid}} gvisor_ubuntu-gvisor_1) dig google.com -p 49200 @127.0.0.11
; <<>> DiG 9.16.1-Ubuntu <<>> google.com -p 49200 @127.0.0.11
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3925
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 118 IN A 172.217.20.206
;; Query time: 8 msec
;; SERVER: 127.0.0.11#49200(127.0.0.11)
;; WHEN: Thu Apr 28 19:51:16 CEST 2022
;; MSG SIZE rcvd: 55
@pkit I am not sure from your description as to what is not working with gVisor. Can you provide a concise explanation of what you expect to work and what is not working?
I am not averse to picking up netfilter settings at startup but our iptables implementation is not as complete as linux and picking up netfilter rules entails a bit of risk.
@kevinGC should have a better answer for you.
@hbhasker
DNS access to 127.0.0.11
is not working over port 53, because it's expected to be redirected to other ports by iptables
rules and there are none copied over.
It also means that there is a little bit of security risk in current implementation, for example in case where startup rules deny
some access, but gvisor rewrites it all to accept
unconditionally...
I have started implementation anyway. Will update when we get to some results.
FYI, the place in moby/docker where that resolver is set up: here
btw I was thinking about this and not sure copying in the rules will work. Since if docker is redirecting it a loopback address like 127.0.0.11 that rule will direct it to the sentry's internal loopback. But dockerd does not listen on the sentry's internal loopback but the host's container namespaces loopback.
You tested by entering the host container namespace that dig is not running inside gvisor. For that you will have to use docker exec to enter gvisor and run the command.
Yup, did that too IIRC. But will re-check.
@hbhasker yup, you were right, it doesn't route to 127.0.0.11 from gvisor.
Will think about it.
@hbhasker still not having iptables set up during boot is a security problem. Traffic may be restricted in vanilla docker/containerd but will be unrestricted under gvisor.
In general we do not trust the sentry so any enforcement in the sentry isn't safe from potential compromise.
E.g a privilege escalation bug in gvisor could allow a user to delete any such network enforcement via iptable.
As such we expect that all such enforcement should happen outside the container on the host.
As for this specific problem maybe the right solution for gvisor would be to generate a resolv.conf pointing to the proper IP/ port for DNS.
That's an internal DNS. It doesn't have non 127.0.0.0/8
address
It looks like it's really a docker problem as for example https://github.com/containerd/nerdctl has no problem routing correctly, because it uses dedicated routed host dns service.
But it's kind of bad for testing as a lot of test tools still use docker.
Will update if I found something interesting.
For others running into this, I was able to work around this issue by setting network_mode: bridge
in my compose.json
. It seems the issue only happens with Compose managed bridges. My resolv.conf
in this configuration shows the default nameserver for my network.
A friendly reminder that this issue had no activity for 120 days.
This issue has been closed due to lack of activity.
Please repoen because this issue is still present and a problem when using gVisor with docker compose.
This is a tough problem to solve. At a high level: putting an application inside gVisor sandboxes it, but Compose puts the DNS server outside the sandbox. I don't know of an especially simple solution.
As Ian mentioned in the other bug, the best way to deal with this is to change the DNS server in use. If you need the Compose DNS server specifically, you could modify iptables rules outside the sandbox to redirect DNS traffic to Compose DNS.
FWIW we now support scraping netfilter rules from the host (see b119cc3 and e7e8d0f) that can be enabled via the --reproduce-nftables
flag. But this still doesn't solve the problem: DNS packets are now redirected via the sandbox loopback, but Docker's DNS server is listening on the host loopback (as Bhasker mentioned above).
I'd love a better solution than "add gVisor-specific configuration", but in this case I'm not sure what that would be.
Is there, maybe, a generic way to add an nftables nat/redirect rule to capture the DNS lookups from within the sandbox to the host loopback DNS server? I am thinking of a (pseudo) static IP address specified as DNS server for the containers in the gVisor sandbox (e.g. via docker-compose) that can be redirected generically to the compose-managed host DNS resolver. That might potentially be generic across all containers?
I'm not sure of a generic way to do it, but it could certainly be done. Note that I'm not familiar with Docker Compose, so I'm not sure how difficult this is.
- Add an iptables rule inside your containers that DNATs requests. For example, 127.0.0.11:53 (or *:53 if you want all DNS traffic) might become 10.1.2.3:53.
- Add an iptables rule outside your containers that DNATs 10.1.2.3:53 to the actual DNS server 127.0.0.11:40107.