Problems with BGP routes on FRR 10.0-01
cliff-ha opened this issue · comments
Description
After upgrading FRR from version 9.1 to version 10.0-01, we have problems with routes not being installed correctly on the server.
As soon as we downgrade the FRR version everything start working again.
We have multiple interfaces on the server (eth0,eth1,eth2), the BGP peers is on eth1 and eth2, and it is receiving the same routes on the two interfaces, but the routes is being installed as if they were received on eth0:
Routing entry for 100.64.1.68/32
Known via "bgp", distance 200, metric 0, best
Last update 00:12:29 ago
100.64.1.248 (recursive), weight 1
* 10.0.9.193, via eth0, weight 1
100.64.1.250 (recursive), weight 1
10.0.9.193, via eth0 (duplicate nexthop removed), weight 1
BGP routing table entry for 100.64.1.68/32, version 33
Paths: (2 available, best #2, table default)
Not advertised to any peer
Local
100.64.1.250 (metric 100) from 100.64.1.250 (195.191.143.22)
Origin IGP, localpref 100, valid, internal, multipath
Originator: 195.191.143.22, Cluster list: 195.191.143.10
Last update: Wed May 29 10:19:09 2024
Local
100.64.1.248 (metric 100) from 100.64.1.248 (195.191.143.22)
Origin IGP, localpref 100, valid, internal, multipath, best (Neighbor IP)
Originator: 195.191.143.22, Cluster list: 195.191.143.10
Last update: Wed May 29 10:18:46 2024
eth1 up default 100.64.1.251/31
eth2 up default 100.64.1.249/31
With ipv6 the routes is just marked as invalid because of the ip being inaccessible even when the peer ip is a direct neighbor.
Version
FRRouting 10.0 (#serverName) on Linux(5.14.0-427.18.1.el9_4.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-static' '--disable-werror' '--enable-multipath=256' '--enable-vtysh' '--enable-ospfclient' '--enable-ospfapi' '--enable-rtadv' '--enable-ldpd' '--enable-pimd' '--enable-pim6d' '--enable-pbrd' '--enable-nhrpd' '--enable-eigrpd' '--enable-babeld' '--enable-vrrpd' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-fpm' '--enable-watchfrr' '--disable-bgp-vnc' '--enable-isisd' '--enable-rpki' '--enable-bfdd' '--enable-pathd' '--enable-snmp' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'
How to reproduce
Have multiple interfaces on the server and then have BGP peers on eth1 and eth2.
Expected behavior
Routes being installed on the interfaces it is received
Actual behavior
routes is installed as received on eth0
Additional context
No response
Checklist
- I have searched the open issues for this bug.
- I have not included sensitive information in this report.
Could you show the configuration also?
Sure.
The configuration looks like this:
router bgp 65500
bgp router-id 100.64.1.65
no bgp default ipv4-unicast
neighbor 100.64.1.248 remote-as 65500
neighbor 100.64.1.250 remote-as 65500
neighbor 2001:db8:2::2 remote-as 65500
neighbor 2001:db8:3::2 remote-as 65500
!
address-family ipv4 unicast
network 100.64.1.65/32
neighbor 100.64.1.248 activate
neighbor 100.64.1.248 prefix-list pl-ipv4-wrt-in in
neighbor 100.64.1.248 prefix-list pl-ipv4-wrt-out out
neighbor 100.64.1.250 activate
neighbor 100.64.1.250 prefix-list pl-ipv4-wrt-in in
neighbor 100.64.1.250 prefix-list pl-ipv4-wrt-out out
exit-address-family
!
address-family ipv6 unicast
network 2001:db8:fffe::2/128
neighbor 2001:db8:2::2 activate
neighbor 2001:db8:2::2 prefix-list pl-ipv6-wrt-in in
neighbor 2001:db8:2::2 prefix-list pl-ipv6-wrt-out out
ip prefix-list pl-ipv4-wrt-in seq 10 permit 100.64.1.64/27 ge 32 le 32
ip prefix-list pl-ipv4-wrt-out seq 10 permit 100.64.1.65/32
ipv6 prefix-list pl-ipv6-wrt-out seq 10 permit 2001:db8:fffe::2/128
ipv6 prefix-list pl-ipv6-wrt-in seq 10 permit 2001:db8:8000::/49 ge 128 le 128
Could you also show interface configuration? "ip add show" (want to see exact configuration, including eth0).
Sure, I have also added som additional commands.
As you can see from the below ipv4 routes is installed as being accessible over eth0, but in ipv6 the routes is not at all installed.
#ip add show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:83:b2:af brd ff:ff:ff:ff:ff:ff
altname enp11s0
altname ens192
inet 10.16.8.66/28 brd 10.16.8.79 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe83:b2af/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:b1:2c:c8 brd ff:ff:ff:ff:ff:ff
altname enp19s0
altname ens224
inet 100.64.1.247/31 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 2001:db8:4::3/127 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::fa50:8295:449f:c7c3/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:b1:e0:18 brd ff:ff:ff:ff:ff:ff
altname enp27s0
altname ens256
inet 100.64.1.245/31 scope global noprefixroute eth2
valid_lft forever preferred_lft forever
inet6 2001:db8:5::3/127 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::1513:a0d6:c642:1d9d/64 scope link noprefixroute
valid_lft forever preferred_lft forever
#ip route
default via 10.16.8.65 dev eth0 proto static metric 100
10.16.8.64/28 dev eth0 proto kernel scope link src 10.16.8.66 metric 100
100.64.1.64 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.65 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.67 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.68 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.69 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.244/31 dev eth2 proto kernel scope link src 100.64.1.245 metric 102
100.64.1.246/31 dev eth1 proto kernel scope link src 100.64.1.247 metric 101
# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2001:db8:4::2/127 dev eth1 proto kernel metric 101 pref medium
2001:db8:5::2/127 dev eth2 proto kernel metric 102 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 1024 pref medium
fe80::/64 dev eth2 proto kernel metric 1024 pref medium
#show bgp ipv6 unicast
BGP table version is 1, local router ID is 100.64.1.66, vrf id 0
Default local pref 100, local AS 65500
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
i2001:db8:fffa::2/128
2001:db8:5::2
100 0 i
i 2001:db8:4::2
100 0 i
i2001:db8:fffb::2/128
2001:db8:5::2
100 0 i
i 2001:db8:4::2
100 0 i
i2001:db8:fffc::2/128
2001:db8:5::2
100 0 i
i 2001:db8:4::2
100 0 i
*> 2001:db8:fffd::2/128
:: 0 32768 i
i2001:db8:fffe::2/128
2001:db8:5::2
0 100 0 i
i 2001:db8:4::2
0 100 0 i
i2001:db8:ffff::2/128
2001:db8:5::2
0 100 0 i
i 2001:db8:4::2
0 100 0 i
Displayed 6 routes and 11 total paths
# show bgp ipv6 unicast 2001:db8:fffa::2/128
BGP routing table entry for 2001:db8:fffa::2/128, version 0
Paths: (2 available, no best path)
Not advertised to any peer
Local
2001:db8:5::2 (inaccessible, import-check enabled) from 2001:db8:5::2 (233.252.0.23)
Origin IGP, localpref 100, invalid, internal
Originator: 233.252.0.23, Cluster list: 233.252.0.10
Last update: Thu May 30 09:39:12 2024
Local
2001:db8:4::2 (inaccessible, import-check enabled) from 2001:db8:4::2 (233.252.0.23)
Origin IGP, localpref 100, invalid, internal
Originator: 233.252.0.23, Cluster list: 233.252.0.10
Last update: Thu May 30 09:39:11 2024
Could you provide these outputs?
show bgp nexthop detail
show ipv6 nht
Sure, I have attached the commands.
# show bgp nexthop detail
Current BGP nexthop cache:
100.64.1.244 valid [IGP metric 100], #paths 5, peer 100.64.1.244
gate 10.16.8.65, if eth0
Last update: Fri May 31 11:02:55 2024
Paths:
1/1 100.64.1.65/32 VRF default flags 0xc10
1/1 100.64.1.64/32 VRF default flags 0xc10
1/1 100.64.1.69/32 VRF default flags 0x418
1/1 100.64.1.67/32 VRF default flags 0x418
1/1 100.64.1.68/32 VRF default flags 0x418
100.64.1.246 valid [IGP metric 100], #paths 5, peer 100.64.1.246
gate 10.16.8.65, if eth0
Last update: Fri May 31 11:02:55 2024
Paths:
1/1 100.64.1.65/32 VRF default flags 0x418
1/1 100.64.1.64/32 VRF default flags 0x418
1/1 100.64.1.69/32 VRF default flags 0xc10
1/1 100.64.1.67/32 VRF default flags 0xc10
1/1 100.64.1.68/32 VRF default flags 0xc10
2001:db8:4::2 invalid, #paths 5, peer 2001:db8:4::2
Last update: Fri May 31 11:02:55 2024
Paths:
2/1 2001:db8:fffe::2/128 VRF default flags 0x400
2/1 2001:db8:ffff::2/128 VRF default flags 0x400
2/1 2001:db8:fffa::2/128 VRF default flags 0x400
2/1 2001:db8:fffc::2/128 VRF default flags 0x400
2/1 2001:db8:fffb::2/128 VRF default flags 0x400
2001:db8:5::2 invalid, #paths 5, peer 2001:db8:5::2
Last update: Fri May 31 11:02:55 2024
Paths:
2/1 2001:db8:fffe::2/128 VRF default flags 0x400
2/1 2001:db8:ffff::2/128 VRF default flags 0x400
2/1 2001:db8:fffa::2/128 VRF default flags 0x400
2/1 2001:db8:fffb::2/128 VRF default flags 0x400
2/1 2001:db8:fffc::2/128 VRF default flags 0x400
# show ipv6 nht
VRF default:
Resolve via default: on
2001:db8:4::2
unresolved
Client list: bgp(fd 29)
2001:db8:5::2
unresolved
Client list: bgp(fd 29)
2001:db8:fffd::2
resolved via local
is directly connected, dummy1 (vrf default)
Client list: bgp(fd 29)
Sure, I have attached the commands.
# show bgp nexthop detail Current BGP nexthop cache: 100.64.1.244 valid [IGP metric 100], #paths 5, peer 100.64.1.244 gate 10.16.8.65, if eth0 Last update: Fri May 31 11:02:55 2024 Paths: 1/1 100.64.1.65/32 VRF default flags 0xc10 1/1 100.64.1.64/32 VRF default flags 0xc10 1/1 100.64.1.69/32 VRF default flags 0x418 1/1 100.64.1.67/32 VRF default flags 0x418 1/1 100.64.1.68/32 VRF default flags 0x418 100.64.1.246 valid [IGP metric 100], #paths 5, peer 100.64.1.246 gate 10.16.8.65, if eth0 Last update: Fri May 31 11:02:55 2024 Paths: 1/1 100.64.1.65/32 VRF default flags 0x418 1/1 100.64.1.64/32 VRF default flags 0x418 1/1 100.64.1.69/32 VRF default flags 0xc10 1/1 100.64.1.67/32 VRF default flags 0xc10 1/1 100.64.1.68/32 VRF default flags 0xc10 2001:db8:4::2 invalid, #paths 5, peer 2001:db8:4::2 Last update: Fri May 31 11:02:55 2024 Paths: 2/1 2001:db8:fffe::2/128 VRF default flags 0x400 2/1 2001:db8:ffff::2/128 VRF default flags 0x400 2/1 2001:db8:fffa::2/128 VRF default flags 0x400 2/1 2001:db8:fffc::2/128 VRF default flags 0x400 2/1 2001:db8:fffb::2/128 VRF default flags 0x400 2001:db8:5::2 invalid, #paths 5, peer 2001:db8:5::2 Last update: Fri May 31 11:02:55 2024 Paths: 2/1 2001:db8:fffe::2/128 VRF default flags 0x400 2/1 2001:db8:ffff::2/128 VRF default flags 0x400 2/1 2001:db8:fffa::2/128 VRF default flags 0x400 2/1 2001:db8:fffb::2/128 VRF default flags 0x400 2/1 2001:db8:fffc::2/128 VRF default flags 0x400
# show ipv6 nht VRF default: Resolve via default: on 2001:db8:4::2 unresolved Client list: bgp(fd 29) 2001:db8:5::2 unresolved Client list: bgp(fd 29) 2001:db8:fffd::2 resolved via local is directly connected, dummy1 (vrf default) Client list: bgp(fd 29)
remote ipv6 peer address is should 2001:db8:4::3 and 2001:db8:5::3.
I'm having this issue as well with around ~15 lab servers moving from 9.1 to 10.0. I can provide full configurations and troubleshooting if necessary, but I may move most of them back to 9.1 given the number of issues 10.0 has. It looks like the nexthops themselves are incorrect for some reason. Here's a server on 10.0:
ns01-cs9.dal10.trae32566.org(config)# do show ip ro 192.168.1.0/24
Routing entry for 192.168.1.0/24
Known via "bgp", distance 200, metric 0, best
Last update 09:37:49 ago
192.168.253.6 (recursive), weight 1
* 192.168.31.1, via bond0, weight 1
192.168.253.7 (recursive), weight 1
192.168.31.1, via bond0 (duplicate nexthop removed), weight 1
Here's a server on 9.1, on the same exact subnet (it has a different IP, but a fairly similar configuration otherwise):
sec01-cs9.dal10.trae32566.org# show ip ro 192.168.1.0/24
Routing entry for 192.168.1.0/24
Known via "bgp", distance 200, metric 0, best
Last update 11:46:56 ago
192.168.253.6 (recursive), weight 1
* 192.168.31.1, via bond0, weight 1
192.168.253.7 (recursive), weight 1
* 192.168.31.2, via bond0, weight 1
@ton31337 let me know if you want my configuration or output of anything else. Additionally I wanted to point out that I believe no bgp suppress-duplicate
should prevent this behavior even if the nexthops were the same, but it does not appear to work.
Additionally I wanted to point out that I believe no bgp suppress-duplicate should prevent this behavior even if the nexthops were the same, but it does not appear to work.
Suppress duplicates should not influence such a behavior at all, because for outgoing updates only.
@cliff-ha could you describe what the topology looks like in your case? I'm trying to replicate the same locally, but still struggling. How the peers are connected, and how the route originator is connected also.
But what's in your case is 10.0.9.193, 195.191.143.10, and 195.191.143.22?
@cliff-ha just in case to eliminate one question (can you try if that changes something or not if you disable no ip nht resolve-via-default
? Because it seems your routes recursively are resolved via a default gateway, which is via eth0.
195.191.143.10 and 195.191.143.22 router-id of the peering devices.
I actually do not know what the 10.0.9.193 ip address is.
I did try to add no ip nht resolve-via-default
and then routes are not installed in the routing table.
This is how it is looking after I added the command.
# show ip route
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, A - Babel, F - PBR, f - OpenFabric,
t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/100] via 10.16.8.65, eth0, 00:03:52
L>* 10.16.8.66/32 is directly connected, eth0, 00:03:52
L>* 100.64.1.66/32 is directly connected, dummy1, 00:03:52
L>* 100.64.1.245/32 is directly connected, eth2, 00:03:52
L>* 100.64.1.247/32 is directly connected, eth1, 00:03:52
# show bgp ipv4 unicast
BGP table version is 16, local router ID is 100.64.1.66, vrf id 0
Default local pref 100, local AS 48854
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
i100.64.1.64/32 100.64.1.246 0 100 0 i
i 100.64.1.244 0 100 0 i
i100.64.1.65/32 100.64.1.246 0 100 0 i
i 100.64.1.244 0 100 0 i
*> 100.64.1.66/32 0.0.0.0 0 32768 i
i100.64.1.67/32 100.64.1.246 100 0 i
i 100.64.1.244 100 0 i
i100.64.1.68/32 100.64.1.246 100 0 i
i 100.64.1.244 100 0 i
i100.64.1.69/32 100.64.1.246 100 0 i
i 100.64.1.244 100 0 i
Displayed 6 routes and 11 total paths
# show bgp ipv4 unicast 100.64.1.68/32
BGP routing table entry for 100.64.1.68/32, version 15
Paths: (2 available, no best path)
Not advertised to any peer
Local
100.64.1.246 (inaccessible, import-check enabled) from 100.64.1.246 (195.191.143.22)
Origin IGP, localpref 100, invalid, internal
Originator: 195.191.143.22, Cluster list: 195.191.143.10
Last update: Tue Jun 4 15:33:47 2024
Local
100.64.1.244 (inaccessible, import-check enabled) from 100.64.1.244 (195.191.143.22)
Origin IGP, localpref 100, invalid, internal
Originator: 195.191.143.22, Cluster list: 195.191.143.10
Last update: Tue Jun 4 15:33:47 2024
That's because you don't have anything in the RIB (no connected routes)... Can you add no ip nht resolve-via-default
to frr.conf and restart?