not working on custom network
markg85 opened this issue · comments
Hi,
To be honest, it's super sad that this even needs to exist. Which i can only blame docker for as their ipv6 support is so horrendously crappy...
Which makes me very happy that this project is there to help people like me :)
I followed your guide and a command like this works:
docker run --rm -t busybox ping6 -c 4 google.com
But if i create a custom network:
docker network create --ipv6 --subnet fd00:dead:beef::/48 test
And then run a container in that network:
docker run --network test --rm -t busybox ping6 -c 4 google.com
It doesn't ping.
I might very well be missing something but i have no clue what that might be?
Or am i trying something that is unsupported?
Best regards,
Mark
Hi Mark, thanks for trying it out. It's not unsupported, that should work. In the custom network case, did you start the ipv6nat container in the same way? It still needs to have --network host
.
Could you share the command you're using to start the ipv6nat container? Also, output of ip6tables-save
(after you've started all containers) might be useful.
Hi Robbert,
Sure thing!
The command to start ipv6nat is:
docker run -d --name ipv6nat --privileged --network host --restart unless-stopped -v /var/run/docker.sock:/var/run/docker.sock:ro -v /lib/modules:/lib/modules:ro robbertkl/ipv6nat
As per your description on the main page of this project.
The iptables output (note, this is a vultr node, spun up for this test purpose):
# Generated by ip6tables-save v1.8.4 on Mon Jul 27 16:03:35 2020
*nat
:PREROUTING ACCEPT [3:228]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d ::1/128 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -o br-2a4cee23ef2e -m addrtype --dst-type LOCAL -j MASQUERADE
-A POSTROUTING -s fd00:dead:beef::/48 ! -o br-2a4cee23ef2e -j MASQUERADE
-A POSTROUTING -o docker0 -m addrtype --dst-type LOCAL -j MASQUERADE
-A POSTROUTING -s fd00::/80 ! -o docker0 -j MASQUERADE
-A DOCKER -i br-2a4cee23ef2e -j RETURN
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Mon Jul 27 16:03:35 2020
# Generated by ip6tables-save v1.8.4 on Mon Jul 27 16:03:35 2020
*mangle
:PREROUTING ACCEPT [101:8272]
:INPUT ACCEPT [48:3552]
:FORWARD ACCEPT [32:2992]
:OUTPUT ACCEPT [165:17064]
:POSTROUTING ACCEPT [186:19192]
COMMIT
# Completed on Mon Jul 27 16:03:35 2020
# Generated by ip6tables-save v1.8.4 on Mon Jul 27 16:03:35 2020
*raw
:PREROUTING ACCEPT [116:9572]
:OUTPUT ACCEPT [165:17064]
COMMIT
# Completed on Mon Jul 27 16:03:35 2020
# Generated by ip6tables-save v1.8.4 on Mon Jul 27 16:03:35 2020
*security
:INPUT ACCEPT [48:3552]
:FORWARD ACCEPT [20:2080]
:OUTPUT ACCEPT [165:17064]
COMMIT
# Completed on Mon Jul 27 16:03:35 2020
# Generated by ip6tables-save v1.8.4 on Mon Jul 27 16:03:35 2020
*filter
:INPUT ACCEPT [8:688]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [27:2932]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o br-2a4cee23ef2e -j DOCKER
-A FORWARD -o br-2a4cee23ef2e -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i br-2a4cee23ef2e ! -o br-2a4cee23ef2e -j ACCEPT
-A FORWARD -i br-2a4cee23ef2e -o br-2a4cee23ef2e -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-2a4cee23ef2e ! -o br-2a4cee23ef2e -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o br-2a4cee23ef2e -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Jul 27 16:03:35 2020
I did try adding the ipv6nat
node to the test network, but that didn't seem to make any difference. The ping still wouldn't work for a container in that same network.
@markg85 Can you restart docker, try to ping and post the output of ip6tables -nvL
which may provide additional information about package stats?
The current configuration seems fine:
-A FORWARD -i br-2a4cee23ef2e ! -o br-2a4cee23ef2e -j ACCEPT
should (finally) ACCEPT your traffic to the destination .-A POSTROUTING -s fd00:dead:beef::/48 ! -o br-2a4cee23ef2e -j MASQUERADE
should (finally) MASQUERADE your traffic.
That second line fails for me.
❯ ip6tables -A POSTROUTING -s fd00:dead:beef::/48 ! -o br-2a4cee23ef2e -j MASQUERADE
ip6tables: No chain/target/match by that name.
This is the output you requested:
❯ ip6tables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
36 3176 DOCKER-USER all * * ::/0 ::/0
36 3176 DOCKER-ISOLATION-STAGE-1 all * * ::/0 ::/0
8 832 DOCKER all * docker0 ::/0 ::/0
8 832 ACCEPT all * docker0 ::/0 ::/0 ctstate RELATED,ESTABLISHED
8 832 ACCEPT all docker0 !docker0 ::/0 ::/0
0 0 ACCEPT all docker0 docker0 ::/0 ::/0
2 144 DOCKER all * br-2a4cee23ef2e ::/0 ::/0
0 0 ACCEPT all * br-2a4cee23ef2e ::/0 ::/0 ctstate RELATED,ESTABLISHED
9 684 ACCEPT all br-2a4cee23ef2e !br-2a4cee23ef2e ::/0 ::/0
2 144 ACCEPT all br-2a4cee23ef2e br-2a4cee23ef2e ::/0 ::/0
0 0 ACCEPT all br-2a4cee23ef2e !br-2a4cee23ef2e ::/0 ::/0
0 0 ACCEPT all br-2a4cee23ef2e !br-2a4cee23ef2e ::/0 ::/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
pkts bytes target prot opt in out source destination
9 684 DOCKER-ISOLATION-STAGE-2 all br-2a4cee23ef2e !br-2a4cee23ef2e ::/0 ::/0
8 832 DOCKER-ISOLATION-STAGE-2 all docker0 !docker0 ::/0 ::/0
36 3176 RETURN all * * ::/0 ::/0
Chain DOCKER-ISOLATION-STAGE-2 (2 references)
pkts bytes target prot opt in out source destination
0 0 DROP all * br-2a4cee23ef2e ::/0 ::/0
0 0 DROP all * docker0 ::/0 ::/0
26 2200 RETURN all * * ::/0 ::/0
Chain DOCKER-USER (1 references)
pkts bytes target prot opt in out source destination
36 3176 RETURN all * * ::/0 ::/0
IPv6 is such a.... thorough piece of crap to get working. I honestly can't wrap my head around how people could invent something so super vague. And that comes from me as software developer. Imagine how totally "out of this world" ipv6 must feel to less technical educated people.
@markg85 Masquerade Rules can only be used in the NAT Table. You might want to check the output of ip6tables -t nat -nvL
.
@markg85 Masquerade Rules can only be used in the NAT Table. You might want to check the output of
ip6tables -t nat -nvL
.
I think you're assuming i know what that output means and how to interpret that? ;)
I have no clue!
❯ ip6tables -t nat -nvL
Chain PREROUTING (policy ACCEPT 9 packets, 684 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all * * ::/0 ::/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 78 packets, 6237 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all * * ::/0 !::1 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 78 packets, 6237 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all * br-2a4cee23ef2e ::/0 ::/0 ADDRTYPE match dst-type LOCAL
0 0 MASQUERADE all * !br-2a4cee23ef2e fd00:dead:beef::/48 ::/0
0 0 MASQUERADE all * docker0 ::/0 ::/0 ADDRTYPE match dst-type LOCAL
2 208 MASQUERADE all * !docker0 fd00::/80 ::/0
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all br-2a4cee23ef2e * ::/0 ::/0
0 0 RETURN all docker0 * ::/0 ::/0
But to remind you. This is still just plain vultr with a fedora node.
If you want, i can give you access.
If you can give me your public key for ssh's authorized_keys file and you can play with it.
I can also spin up a new node with only that node + docker installed, nothing more (and cgroups set to v1 for docker). So that you begin with a clean slate.
@markg85 As you can see, the corresponding NAT rule is not matched by your packages:
0 0 MASQUERADE all * !br-2a4cee23ef2e fd00:dead:beef::/48 ::/0
If you are able to create a clean system, you should check if it works on this machine.
@markg85 As you can see, the corresponding NAT rule is not matched by your packages:
0 0 MASQUERADE all * !br-2a4cee23ef2e fd00:dead:beef::/48 ::/0
If you are able to create a clean system, you should check if it works on this machine.
I'm totally lost.
I'm following the exact lines on the main page which apparently gives me that.
I can't make things up as i simply can't think of what to type in or what means what.
Lets bisect what you say first, as that sounds alien to me too.
the corresponding NAT rule is not matched by your packages
NAT rule
(network address translation), i know the abbreviation. I "kinda" know what nat does.
packages
... What packages? We're not talking about fedora distribution packages i assume. I'm guessing "network packages" but that too doesn't make much sense to me as this seems to apply to the whole "adapter". I'm lost, what do you mean?
you should check if it works on this machine
... That's exactly what i'm doing and why i'm ending here. I can't. And i can't "guess" the correct commands as ipv6 made any guess work require scientific papers and many decades of research before one gets it concepts.
Lastly, can i have your public ssh key? I'll spin up a new node and do absolutely nothing on it besides installing docker + adding you to the authorized_keys
file. Hopefully you can try out the case i have here.
I'm totally lost.
Okay, let's break it down to a few essential points.
All your docker containers will get a private IPv4 address, which is used for internal communication. As your docker host (very likely) does not have enough public IPv4 addresses for each container, outgoing packets (destination outside your docker network or docker host) have to use NAT (Network Address Translation) which will basically provide a way to communicate with services outside your docker host with the public IPv4 address of your docker host.
IPv6 provides a huge space of addresses, so you (might) be able to directly assign public IPv6 addresses to your container so you do not need NAT. As this is a completely different solution comparing to Docker's IPv4 handling, Docker-IPv6-NAT will simply do exactly the same as docker already does for IPv4: NAT for IPv6.
So if you ping Google, your ICMPv6 Echo Request packets will leave the docker network and the docker host. As a result of this, the following rule should be applied:
Chain POSTROUTING
0 0 MASQUERADE all * !br-2a4cee23ef2e fd00:dead:beef::/48 ::/0
"Please MASQUERADE (which you want for your IPv6 NAT setup) all packets which leaves the docker network (you created) and are sent from containers attached to this network."
As you can see, it works for the default bridge docker0
:
2 208 MASQUERADE all * !docker0 fd00::/80 ::/0
This might be a misconfiguration of your docker network, a routing issue or a DNS issue as I cannot see any IPTables rules, which might drops your outgoing packets.
Can you please provide the output from your docker host for:
docker network inspect test
ip -6 route
And from a docker container spawned with docker run --network test --rm -it debian:stable bash
:
ip addr
cat /etc/resolv.conf
ip -4 route
ip -6 route
Lastly, can i have your public ssh key?
Short answer: no. Long answer: I do not prefer to log into other peoples machines.
But with the requested output (see above), we might be able to solve your problem.
@bephinix Thank you for that elaborate explanation! Really awesome!
I think i'm at to point to know enough about it to know that i know no shit about it :) (if that makes sense, hehe)
The way you describe it sounds logical to me and how i would want to have it to work. I don't want to manage every individual container by it's own to have it's firewall in order. I want my containers to all be private unless i expose ports to the outside world (which docker conceptually allows just fine). So i think, conceptually, NAT is a fitting solution. In fact, i'd argue that configuring each instance (if each had a public ipv6 ip) is much more of a hassle and performance overhead that is just not needed. The nodes i have now all have their own ipv4 local address and that's perfectly fine. I don't need or even want them to be public.
Here's a lot of information you requested :)
❯ docker network inspect test
[
{
"Name": "test",
"Id": "2a4cee23ef2ed89554c9a36f0b79ff1ccfc7f5ee0c2ccd571b4081fcb7233aa8",
"Created": "2020-07-27T15:38:16.964289023Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": true,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.22.0.0/16",
"Gateway": "172.22.0.1"
},
{
"Subnet": "fd00:dead:beef::/48"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {},
"Labels": {}
}
]
❯ ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2001:19f0:5001:1d26::/64 dev ens3 proto ra metric 100 pref medium
fd00::/80 dev docker0 proto kernel metric 256 linkdown pref medium
fd00::/80 dev docker0 metric 1024 linkdown pref medium
fd00:dead:beef::/48 dev br-2a4cee23ef2e proto kernel metric 256 linkdown pref medium
fe80::/64 dev ens3 proto kernel metric 100 pref medium
fe80::/64 dev docker0 proto kernel metric 256 linkdown pref medium
fe80::/64 dev br-2a4cee23ef2e proto kernel metric 256 linkdown pref medium
default via fe80::fc00:2ff:fee8:c673 dev ens3 proto ra metric 100 pref medium
And here are the outputs from within the container (docker run --network test --rm -it debian:stable bash
)
root@1258d023b558:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
44: eth0@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:16:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.22.0.2/16 brd 172.22.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd00:dead:beef::2/48 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe16:2/64 scope link
valid_lft forever preferred_lft forever
root@1258d023b558:/# cat /etc/resolv.conf
nameserver 127.0.0.11
nameserver 2001:19f0:300:1704::6
options ndots:0
root@1258d023b558:/# ip -4 route
default via 172.22.0.1 dev eth0
172.22.0.0/16 dev eth0 proto kernel scope link src 172.22.0.2
root@1258d023b558:/# ip -6 route
fd00:dead:beef::/48 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fd00:dead:beef::1 dev eth0 metric 1024 pref medium
Note that if you don't trust my box, that's fine :) You can spin up a vultr node too, hehe. Tell me if you want to go that route as i can give you the exact commands i used to get it up and running.
@markg85 Okay, I would guess that the cause of the broken IPv6 ping are misconfigured devices, routes or addresses.
As you see, the docker network you created does not have a gateway set for IPv6; it should look like this:
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
},
{
"Subnet": "fd00:dead:beef::/48",
"Gateway": "fd00:dead:beef::1" ## This line is missing! ##
}
]
Can you check which IPv6 addresses are configured for your docker host by using ip addr
? (You may want to mask your public IPs.)
Tell me if you want to go that route as i can give you the exact commands i used to get it up and running.
This might be useful. Please also add your Linux Kernel version and the name and version of the OS you are using.
There must be more wrong.
The help on this page does not mention that the gateway needs to be added.
I removed the network and recreated it:
docker network create --ipv6 --subnet fd00:dead:beef::/48 --gateway fd00:dead:beef::1 test
And ran: docker run --network test --rm -it debian:stable bash
Now we're even further gone.
ping6 google.nl <-- no result
ping google.nl <-- no result
The network looks like this now:
❯ docker network inspect test
[
{
"Name": "test",
"Id": "bae07ad09bdc83a5422fcef0f51f141b9d6f7894b4803ea2d8d5812480b8ba94",
"Created": "2020-08-02T15:49:24.476825663Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": true,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.23.0.0/16",
"Gateway": "172.23.0.1"
},
{
"Subnet": "fd00:dead:beef::/48",
"Gateway": "fd00:dead:beef::1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {},
"Labels": {}
}
]
To be fair, i think we do keep doing this all next week and still get no further.
I really think it would be best if you could either spin up a vultr node or me spinning one up and giving you access to it.
I obviously have no clue what i'm doing, you obviously do. You can test things much faster if you don't have the hours of latency between your question and my answers :)
@markg85 Okay, let's start with how you created your VM/Node. Can you post the steps and the OS and kernel version you are running on your node?
Vultr: i took Fedora 32 x64 (32 is the version, don't be fooled by that number)
Logon to your node.
Note: everything written in these blocks
are the commands i'm executing.
Updating everything:
dnf update -y
and reboot
Go back to cgroups v1 as docker doesn't like v1 yet
And i'm not willing to use "podman".
grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
and reboot
Install docker (moby)
dnf install moby-engine -y
And enable/start it
systemctl enable docker --now
Test if it works
docker run --rm -t busybox ping -c 4 google.com
<-- gives 4 ping results
docker run --rm -t busybox ping6 -c 4 google.com
<-- fails (unsurprisingly, i did nothing with ipv6 here in this case)
And done :)
From this moment on you have a working docker environment.
Next is getting the NAT + IPv6 stuff working.
Right now on that particular node my kernel is: Linux ipv6-test 5.7.11-200.fc32.x86_64 #1 SMP Wed Jul 29 17:15:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
but you might very well have a different one. Also, i don't think the kernel version matters. Assumption... sometimes dangerous :P but hey, IPv6... it's "alive" for so many years already. I sure as hell are not running a 2.0 kernel or something ancient like it.
Good luck!
@markg85 Thx, I will check it out.
@markg85 Can you test the following command: docker run --network test --rm -t busybox ping6 -c 4 2001:4860:4860::8888
?
@markg85 Can you test the following command:
docker run --network test --rm -t busybox ping6 -c 4 2001:4860:4860::8888
?
Yes.
That works! Why? lol
@markg85 I think I found your error. Well, it is not your error.
It was the DNS.
Docker itself and Docker-IPv6-NAT uses IPTables/IP6Tables for setting up firewall rules to isolate networks and containers and managing NAT. There are (at least) two kernel subsystems, which will handle the "filter operations": nf_tables
(new) and x_tables
(legacy).
As Docker currently does not support nf_tables, it uses x_tables as the IPTables backend (so does Docker-IPv6-NAT). Fedora 32 uses nf_tables for its firewall, which breaks Docker Networking; I was not able to resolve hostnamens on IPv4-only bridges either.
If you use a custom bridge network (not docker0
), Docker will configure its own DNS resolver to be able to resolve container hostnames. If you using nf_tables, the forwarded DNS queries will not succeed.
It might work with docker0
, as this default bridge is not a normal birdge network.
This nf_tables/x_tables issue is also present in Debian 10 (or was).
It took me some hours to find the underlying cause and I wrote a Tutorial for a Hosting Provider which contains some information about it and a workaround; you might want to check it out.
How to Fix
Edit /etc/firewalld/firewalld.conf
and set FirewallBackend=nftables
to FirewallBackend=iptables
and reboot.
It should work now!
Ahh, so that's why you kept asking for the kernel ;)
See, this is going a lot faster!
So weirdly enough, this didn't work right away after applying your fix.
I undid it to just figure out that nothing worked anymore (any ping, ipv4 and v6).
I redid your fix and now for whatever creepy magical mystical reason it all of a sudden does work...
For now, it works
But i'm really expecting this to fail for whatever reason.
Also, the guide on the main page doesn't tell me to add the ipv6 gateway in the network create line. Could you fix that up?
And perhaps mention this whole bla_tables shebang in there too ;)
@markg85 Great it works now! You should an OS which is officially supported by Docker; e.g. Fedora 31.
Also, the guide on the main page doesn't tell me to add the ipv6 gateway in the network create line.
Well you do not have to. I am using Debian 10 and ArchLInux and the Gateway is set correctly. There is also a bug for docker network inspect
which does not show the gateway although it is set.
And perhaps mention this whole bla_tables shebang in there too ;)
Yes, I will create a PR.
@robbertkl This should be solved.
Thanks!
@markg85 Great it works now! You should an OS which is officially supported by Docker; e.g. Fedora 31.
Also, the guide on the main page doesn't tell me to add the ipv6 gateway in the network create line.
Well you do not have to. I am using Debian 10 and ArchLInux and the Gateway is set correctly. There is also a bug for
docker network inspect
which does not show the gateway although it is set.And perhaps mention this whole bla_tables shebang in there too ;)
Yes, I will create a PR.
Perhaps choosing Fedora 32 added to the difficulties.
Do know that i start a node with the intention to keep it running for quite a while (years sometimes) so i prefer choosing the latest version.
Thank you very much for all your help!
Now i can - finally - get nginx to shut up with ipv6 certificates that it can't check... It's not blocking anything. Just a warning in the log that i want to get rid of which triggered this endeavor.