projectcalico / calico-bgp-daemon

GoBGP based Calico BGP Daemon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IPIP routes may not be fixed at start of day if IPIP disabled while daemon not running

robbrockbank opened this issue · comments

Expected Behavior

Not 100% that this is the case - so needs to be investigated.

This was an issue we were having with the bird backend that we fixed under projectcalico/bird#53

In summary, if a route is programmed to use the tunnel, calico/node is shutdown, IP pool is updated such that the route should no longer use the tunnel, calico/node is restarted - the route does not get re-programmed to not use the tunnel. More repro details in the bird issue.

The bird issue is resolved and an test added to the calico sts: projectcalico/calico#1625

I couldn't get this test working with the go-bgp backend, so either go-bgp is also suffering the same issue, or the test is bad. Either way - needs a little more investigation.

Current Behavior

IPIP route may not be (un-)programmed correctly at start of day.

Possible Solution

Route scan at start of day should fix up the routes.

Steps to Reproduce (for bugs)

See bird issue in link, or calico ST in link.

Context

Your Environment

calico/node STs on the v2.6.x-series branch.

I tried ST. test_issue_1584 failes whith calling assert_ipip_routing() just after a calico-node on
host1 was removed.
Looking at the log of calico-node on host2, I found that calico-bgp-daemon(use bgpd later)
detects Peer down and removes the ipip route.

The bgpd on host2 detects it and removes route as soon as the calico-node on host1 is removed
but the bird seems to take about 2 minutes detecting peer down.
If assert_ipip_routing() is executed before route is deleted, it will succeed.
I put the following pr.

projectcalico/calico#1749

but the bird seems to take about 2 minutes detecting peer down.

Yes, this is the desired behavior with BGP graceful restart enabled. The peer should not remove its routes until the neighbor restart timer expires.

See here: http://bird.network.cz/?get_doc&v=20&f=bird-6.html#bgp-graceful-restart

graceful restart time number

  The restart time is announced in the BGP graceful restart capability and specifies how long the neighbor would wait for the BGP session to re-establish after a restart before deleting stale routes. Default: 120 seconds.

Hmm, the graceful restart function should be added to bgpd.
But I will not be able to start it immediately as I am busy with another business now.