Interface and static route cleanup on transient launch failure

Question

Interface and static route cleanup on transient launch failure

flaminidavid opened this issue 2 months ago · comments

The cleanup issue mentioned in #143 is actually impacting external connectivity. For example, if the launcher fails to bring up the router container because the container image is not (yet) available it will leave static routes and bridges behind. This results in the kernel resolving to the wrong bridge interface to reach the container.

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> ssh REDACTED_HOSTNAME
ssh: connect to host REDACTED_HOSTNAME port 22: No route to host <<< SSH fails with no route message :(

[x]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> clab inspect
INFO[0000] Parsing & checking topology file: topo.clab.yaml 
+---+------------------+--------------+------------------------------------------+------+---------+----------------+----------------------+
| # |       Name       | Container ID |                  Image                   | Kind |  State  |  IPv4 Address  |     IPv6 Address     |
+---+------------------+--------------+------------------------------------------+------+---------+----------------+----------------------+
| 1 | REDACTED_HOSTNAME | d9d8482e94d5 | registry:80/REDACTED_OS-lab/REDACTED_OS-lab:latest | REDACTED_OS | running | 172.20.20.2/24 | 2001:172:20:20::2/64 |
+---+------------------+--------------+------------------------------------------+------+---------+----------------+----------------------+

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> ip route 
default via 169.254.1.1 dev eth0 
169.254.1.1 dev eth0 scope link 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.20.20.0/24 dev br-9e0abc3fc5c4 proto kernel scope link src 172.20.20.1 linkdown <<< Static route installed during previous REDACTED_HOSTNAME launch attempt which failed (image not available yet)
172.20.20.0/24 dev br-8e01cb23f4e2 proto kernel scope link src 172.20.20.1 linkdown <<< Static route installed during previous REDACTED_HOSTNAME launch attempt which failed (image not available yet)
172.20.20.0/24 dev br-99c0d2f2be16 proto kernel scope link src 172.20.20.1 linkdown <<< Static route installed during previous REDACTED_HOSTNAME launch attempt which failed (image not available yet)
172.20.20.0/24 dev br-0a9d5d103385 proto kernel scope link src 172.20.20.1 linkdown <<< Static route installed during previous REDACTED_HOSTNAME launch attempt which failed (image not available yet)
172.20.20.0/24 dev br-70f3bba49d02 proto kernel scope link src 172.20.20.1 linkdown <<< Static route installed during previous REDACTED_HOSTNAME launch attempt which failed (image not available yet)
172.20.20.0/24 dev br-ade82612ffc0 proto kernel scope link src 172.20.20.1 <<< Static route installed during last launch attempt, which was successful

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> ip route get 172.20.20.2
172.20.20.2 dev br-9e0abc3fc5c4 src 172.20.20.1 uid 0 <<< Kernel resolves to first interface with link down due to static route
    cache 

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> sysctl -w net.ipv4.conf.br-9e0abc3fc5c4.ignore_routes_with_linkdown=1 <<< Set Kernel to ignore routes with link down on this interface
net.ipv4.conf.br-9e0abc3fc5c4.ignore_routes_with_linkdown = 1

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> ip route get 172.20.20.2
172.20.20.2 dev br-8e01cb23f4e2 src 172.20.20.1 uid 0 <<< Kernel resolves to second interface with link down due to static route
    cache 

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> for down_bridge in $(ip link show type bridge | grep br- | awk -F: '{print $2}'); do sysctl -w net.ipv4.conf.$down_bridge.ignore_routes_with_linkdown=1; done
net.ipv4.conf.br-9e0abc3fc5c4.ignore_routes_with_linkdown = 1 <<< Set Kernel to ignore routes with link down on this interface
net.ipv4.conf.br-8e01cb23f4e2.ignore_routes_with_linkdown = 1 <<< Set Kernel to ignore routes with link down on this interface
net.ipv4.conf.br-99c0d2f2be16.ignore_routes_with_linkdown = 1 <<< Set Kernel to ignore routes with link down on this interface
net.ipv4.conf.br-0a9d5d103385.ignore_routes_with_linkdown = 1 <<< Set Kernel to ignore routes with link down on this interface
net.ipv4.conf.br-70f3bba49d02.ignore_routes_with_linkdown = 1 <<< Set Kernel to ignore routes with link down on this interface
net.ipv4.conf.br-ade82612ffc0.ignore_routes_with_linkdown = 1 <<< Set Kernel to ignore routes with link down on this interface

[*]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> ip route get 172.20.20.2
172.20.20.2 dev br-ade82612ffc0 src 172.20.20.1 uid 0 <<< Kernel resolves to the only interface without link down
    cache 

[x]─[REDACTED_HOSTNAME]─[/clabernetes]
└──> ssh REDACTED_HOSTNAME
Warning: Permanently added 'REDACTED_HOSTNAME' (...) to the list of known hosts.
(admin@REDACTED_HOSTNAME) Password: 
REDACTED_HOSTNAME> <<< SSH succeeds :)

Roman Dodin · Answer 1 · Fri May 31 2024 23:14:53 GMT+0800 (China Standard Time)

@flaminidavid I wonder, you mentioned that the image is not available for some time, but the output shows the registry is being used, and not a local daemon store.

Don't you have images pushed in your private registry ahead of time before launching the lab?

@carlmontanari I wonder if regardless of the above comment all we need to do is call clab dep -c <topo>, with -c flag doing the clean redeployment that should remove the unused docker bridge

Carl Montanari · Answer 2 · Sun Jun 02 2024 23:22:10 GMT+0800 (China Standard Time)

if the launcher fails to bring up the router container because the container image is not (yet) available

then the launcher should be crashing. it seems you are execing onto the launcher to test stuff which is cool, but not how this works "irl". in your case it seems maybe you run stuff manually then re-launching after the image is ready or something (lots of assumptions here obv shout if they are incorrect!). I think that none of this is a problem in the normal flow as the launcher would always crash after any failure then we'd have a fresh container to try again. Guses we can add the -c flag if it matters but I dont think it "should".

flaminidavid · Answer 3 · Wed Jun 05 2024 20:51:46 GMT+0800 (China Standard Time)

Sorry folks, I was out.

Don't you have images pushed in your private registry ahead of time before launching the lab?

I agree, I should. I'm probably abusing the launcher's ability to retry here.
I guess my point is that if something (else) fails during launch the launcher ends up in this dirty state.

it seems maybe you run stuff manually then re-launching after the image is ready or something

No, it's just a workaround for discard_unpacked_layers. Basically I deploy a separate registry (and puller) in the same namespace at the same time I deploy a topology via Clabernetes. I should be able to get rid of all that with the Docker config feature you added in #143. I'm not doing anything to the launcher, it's just retrying until the image is there. I don't know what else could make the launcher fail in a similar way, but if it did it would end up dirty (those static routes and bridges left behind).

the launcher would always crash after any failure then we'd have a fresh container to try again

Looks like the launcher container itself doesn't crash, only the router container inside does. Those static routes and bridges live in the launcher and stick around after the router container failure.

Carl Montanari · Answer 4 · Thu Jun 06 2024 05:25:09 GMT+0800 (China Standard Time)

Looks like the launcher container itself doesn't crash, only the router container inside does. Those static routes and bridges live in the launcher and stick around after the router container failure.

can you share logs of that? if clab exits non-zero it should crash, if the container starts then goes away, it should crash. if you were on the pod (like made entry point sleep or something and exec'd on) then it would just exit but not crash since then clabernetes is not the entry point. tl;dr yeah if it doesn't it should so logs should help pin point where, but it really should be crashing properly hah. I hope :D

Carl Montanari · Answer 5 · Mon Jun 17 2024 02:02:27 GMT+0800 (China Standard Time)

unsure if it will be the "fix" but am adding the -c in #159 just in case. will prolly cut 0.1.2 today so then maybe can test that and let us know if it sorts it!

flaminidavid · Answer 6 · Mon Jun 24 2024 20:40:41 GMT+0800 (China Standard Time)

Thanks! Upgraded today to 0.1.3, it did not fix it. I'll test passing Docker config in a bit, which should be enough for my use case.

I checked that the launcher is using 0.1.3: Image: ghcr.io/srl-labs/clabernetes/clabernetes-launcher:0.1.3

[/clabernetes]
└──> ip route | grep br
172.20.20.0/24 dev br-34417135bcc2 proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-9082e6489457 proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-c455be425faa proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-4c04d38d7c33 proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-6dbd9cfb63a7 proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-5bbea88fefad proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-2a67784d5de3 proto kernel scope link src 172.20.20.1 linkdown 
172.20.20.0/24 dev br-db61001317a1 proto kernel scope link src 172.20.20.1

Roman Dodin · Answer 7 · Mon Jun 24 2024 20:52:58 GMT+0800 (China Standard Time)

@flaminidavid what does docker network ls say in your case?

flaminidavid · Answer 8 · Mon Jun 24 2024 21:14:24 GMT+0800 (China Standard Time)

[/clabernetes]
└──> docker network ls
NETWORK ID     NAME      DRIVER    SCOPE
a7671c32b0fa   bridge    bridge    local
db61001317a1   clab      bridge    local
5225d54edb9d   host      host      local
e8c80bd3b98a   none      null      local

flaminidavid · Answer 9 · Tue Jun 25 2024 00:15:07 GMT+0800 (China Standard Time)

In other news, overriding Docker config via spec.imagePull.dockerConfig worked just fine (and switched back pullThroughOverride from never to auto). This is working with AWS ECR, too.

Carl Montanari · Answer 10 · Tue Jun 25 2024 02:41:18 GMT+0800 (China Standard Time)

damn haha ok cool thanks a bunch for updating us all. will prolly futz around with this this weekend. I think I accidentally encountered the same issue (recreating it may be another story!) so can for sure see if we can fix... will keep ya posted!