FRRouting / frr

The FRRouting Protocol Suite

Home Page:https://frrouting.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems with BGP routes on FRR 10.0-01

cliff-ha opened this issue · comments

Description

After upgrading FRR from version 9.1 to version 10.0-01, we have problems with routes not being installed correctly on the server.
As soon as we downgrade the FRR version everything start working again.
We have multiple interfaces on the server (eth0,eth1,eth2), the BGP peers is on eth1 and eth2, and it is receiving the same routes on the two interfaces, but the routes is being installed as if they were received on eth0:

Routing entry for 100.64.1.68/32
  Known via "bgp", distance 200, metric 0, best
  Last update 00:12:29 ago
    100.64.1.248 (recursive), weight 1
  *   10.0.9.193, via eth0, weight 1
    100.64.1.250 (recursive), weight 1
      10.0.9.193, via eth0 (duplicate nexthop removed), weight 1
BGP routing table entry for 100.64.1.68/32, version 33
Paths: (2 available, best #2, table default)
  Not advertised to any peer
  Local
    100.64.1.250 (metric 100) from 100.64.1.250 (195.191.143.22)
      Origin IGP, localpref 100, valid, internal, multipath
      Originator: 195.191.143.22, Cluster list: 195.191.143.10
      Last update: Wed May 29 10:19:09 2024
  Local
    100.64.1.248 (metric 100) from 100.64.1.248 (195.191.143.22)
      Origin IGP, localpref 100, valid, internal, multipath, best (Neighbor IP)
      Originator: 195.191.143.22, Cluster list: 195.191.143.10
      Last update: Wed May 29 10:18:46 2024
eth1            up      default         100.64.1.251/31
eth2            up      default         100.64.1.249/31

With ipv6 the routes is just marked as invalid because of the ip being inaccessible even when the peer ip is a direct neighbor.

Version

FRRouting 10.0 (#serverName) on Linux(5.14.0-427.18.1.el9_4.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-static' '--disable-werror' '--enable-multipath=256' '--enable-vtysh' '--enable-ospfclient' '--enable-ospfapi' '--enable-rtadv' '--enable-ldpd' '--enable-pimd' '--enable-pim6d' '--enable-pbrd' '--enable-nhrpd' '--enable-eigrpd' '--enable-babeld' '--enable-vrrpd' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-fpm' '--enable-watchfrr' '--disable-bgp-vnc' '--enable-isisd' '--enable-rpki' '--enable-bfdd' '--enable-pathd' '--enable-snmp' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'

How to reproduce

Have multiple interfaces on the server and then have BGP peers on eth1 and eth2.

Expected behavior

Routes being installed on the interfaces it is received

Actual behavior

routes is installed as received on eth0

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.

Could you show the configuration also?

Sure.
The configuration looks like this:
router bgp 65500
bgp router-id 100.64.1.65
no bgp default ipv4-unicast
neighbor 100.64.1.248 remote-as 65500
neighbor 100.64.1.250 remote-as 65500
neighbor 2001:db8:2::2 remote-as 65500
neighbor 2001:db8:3::2 remote-as 65500
!
address-family ipv4 unicast
network 100.64.1.65/32
neighbor 100.64.1.248 activate
neighbor 100.64.1.248 prefix-list pl-ipv4-wrt-in in
neighbor 100.64.1.248 prefix-list pl-ipv4-wrt-out out
neighbor 100.64.1.250 activate
neighbor 100.64.1.250 prefix-list pl-ipv4-wrt-in in
neighbor 100.64.1.250 prefix-list pl-ipv4-wrt-out out
exit-address-family
!
address-family ipv6 unicast
network 2001:db8:fffe::2/128
neighbor 2001:db8:2::2 activate
neighbor 2001:db8:2::2 prefix-list pl-ipv6-wrt-in in
neighbor 2001:db8:2::2 prefix-list pl-ipv6-wrt-out out

ip prefix-list pl-ipv4-wrt-in seq 10 permit 100.64.1.64/27 ge 32 le 32
ip prefix-list pl-ipv4-wrt-out seq 10 permit 100.64.1.65/32

ipv6 prefix-list pl-ipv6-wrt-out seq 10 permit 2001:db8:fffe::2/128
ipv6 prefix-list pl-ipv6-wrt-in seq 10 permit 2001:db8:8000::/49 ge 128 le 128

Could you also show interface configuration? "ip add show" (want to see exact configuration, including eth0).

Sure, I have also added som additional commands.
As you can see from the below ipv4 routes is installed as being accessible over eth0, but in ipv6 the routes is not at all installed.

#ip add show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:83:b2:af brd ff:ff:ff:ff:ff:ff
    altname enp11s0
    altname ens192
    inet 10.16.8.66/28 brd 10.16.8.79 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe83:b2af/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b1:2c:c8 brd ff:ff:ff:ff:ff:ff
    altname enp19s0
    altname ens224
    inet 100.64.1.247/31 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 2001:db8:4::3/127 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::fa50:8295:449f:c7c3/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b1:e0:18 brd ff:ff:ff:ff:ff:ff
    altname enp27s0
    altname ens256
    inet 100.64.1.245/31 scope global noprefixroute eth2
       valid_lft forever preferred_lft forever
    inet6 2001:db8:5::3/127 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::1513:a0d6:c642:1d9d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
#ip route
default via 10.16.8.65 dev eth0 proto static metric 100
10.16.8.64/28 dev eth0 proto kernel scope link src 10.16.8.66 metric 100
100.64.1.64 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.65 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.67 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.68 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.69 nhid 25 via 10.16.8.65 dev eth0 proto bgp metric 20
100.64.1.244/31 dev eth2 proto kernel scope link src 100.64.1.245 metric 102
100.64.1.246/31 dev eth1 proto kernel scope link src 100.64.1.247 metric 101
# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2001:db8:4::2/127 dev eth1 proto kernel metric 101 pref medium
2001:db8:5::2/127 dev eth2 proto kernel metric 102 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 1024 pref medium
fe80::/64 dev eth2 proto kernel metric 1024 pref medium
#show bgp ipv6 unicast
BGP table version is 1, local router ID is 100.64.1.66, vrf id 0
Default local pref 100, local AS 65500
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
   i2001:db8:fffa::2/128
                    2001:db8:5::2
                                                  100      0 i
   i                 2001:db8:4::2
                                                  100      0 i
   i2001:db8:fffb::2/128
                    2001:db8:5::2
                                                  100      0 i
   i                 2001:db8:4::2
                                                  100      0 i
   i2001:db8:fffc::2/128
                    2001:db8:5::2
                                                  100      0 i
   i                 2001:db8:4::2
                                                  100      0 i
 *> 2001:db8:fffd::2/128
                    ::                       0         32768 i
   i2001:db8:fffe::2/128
                    2001:db8:5::2
                                             0    100      0 i
   i                 2001:db8:4::2
                                             0    100      0 i
   i2001:db8:ffff::2/128
                    2001:db8:5::2
                                             0    100      0 i
   i                 2001:db8:4::2
                                             0    100      0 i

Displayed 6 routes and 11 total paths
# show bgp ipv6 unicast 2001:db8:fffa::2/128
BGP routing table entry for 2001:db8:fffa::2/128, version 0
Paths: (2 available, no best path)
  Not advertised to any peer
  Local
    2001:db8:5::2 (inaccessible, import-check enabled) from 2001:db8:5::2 (233.252.0.23)
      Origin IGP, localpref 100, invalid, internal
      Originator: 233.252.0.23, Cluster list: 233.252.0.10
      Last update: Thu May 30 09:39:12 2024
  Local
    2001:db8:4::2 (inaccessible, import-check enabled) from 2001:db8:4::2 (233.252.0.23)
      Origin IGP, localpref 100, invalid, internal
      Originator: 233.252.0.23, Cluster list: 233.252.0.10
      Last update: Thu May 30 09:39:11 2024

Could you provide these outputs?

show bgp nexthop detail
show ipv6 nht

Sure, I have attached the commands.

# show bgp nexthop detail
Current BGP nexthop cache:
 100.64.1.244 valid [IGP metric 100], #paths 5, peer 100.64.1.244
  gate 10.16.8.65, if eth0
  Last update: Fri May 31 11:02:55 2024
  Paths:
    1/1 100.64.1.65/32 VRF default flags 0xc10
    1/1 100.64.1.64/32 VRF default flags 0xc10
    1/1 100.64.1.69/32 VRF default flags 0x418
    1/1 100.64.1.67/32 VRF default flags 0x418
    1/1 100.64.1.68/32 VRF default flags 0x418
 100.64.1.246 valid [IGP metric 100], #paths 5, peer 100.64.1.246
  gate 10.16.8.65, if eth0
  Last update: Fri May 31 11:02:55 2024
  Paths:
    1/1 100.64.1.65/32 VRF default flags 0x418
    1/1 100.64.1.64/32 VRF default flags 0x418
    1/1 100.64.1.69/32 VRF default flags 0xc10
    1/1 100.64.1.67/32 VRF default flags 0xc10
    1/1 100.64.1.68/32 VRF default flags 0xc10
 2001:db8:4::2 invalid, #paths 5, peer 2001:db8:4::2
  Last update: Fri May 31 11:02:55 2024
  Paths:
    2/1 2001:db8:fffe::2/128 VRF default flags 0x400
    2/1 2001:db8:ffff::2/128 VRF default flags 0x400
    2/1 2001:db8:fffa::2/128 VRF default flags 0x400
    2/1 2001:db8:fffc::2/128 VRF default flags 0x400
    2/1 2001:db8:fffb::2/128 VRF default flags 0x400
 2001:db8:5::2 invalid, #paths 5, peer 2001:db8:5::2
  Last update: Fri May 31 11:02:55 2024
  Paths:
    2/1 2001:db8:fffe::2/128 VRF default flags 0x400
    2/1 2001:db8:ffff::2/128 VRF default flags 0x400
    2/1 2001:db8:fffa::2/128 VRF default flags 0x400
    2/1 2001:db8:fffb::2/128 VRF default flags 0x400
    2/1 2001:db8:fffc::2/128 VRF default flags 0x400
# show ipv6 nht
VRF default:
 Resolve via default: on
2001:db8:4::2
 unresolved
 Client list: bgp(fd 29)
2001:db8:5::2
 unresolved
 Client list: bgp(fd 29)
2001:db8:fffd::2
 resolved via local
 is directly connected, dummy1 (vrf default)
 Client list: bgp(fd 29)

Sure, I have attached the commands.

# show bgp nexthop detail
Current BGP nexthop cache:
 100.64.1.244 valid [IGP metric 100], #paths 5, peer 100.64.1.244
  gate 10.16.8.65, if eth0
  Last update: Fri May 31 11:02:55 2024
  Paths:
    1/1 100.64.1.65/32 VRF default flags 0xc10
    1/1 100.64.1.64/32 VRF default flags 0xc10
    1/1 100.64.1.69/32 VRF default flags 0x418
    1/1 100.64.1.67/32 VRF default flags 0x418
    1/1 100.64.1.68/32 VRF default flags 0x418
 100.64.1.246 valid [IGP metric 100], #paths 5, peer 100.64.1.246
  gate 10.16.8.65, if eth0
  Last update: Fri May 31 11:02:55 2024
  Paths:
    1/1 100.64.1.65/32 VRF default flags 0x418
    1/1 100.64.1.64/32 VRF default flags 0x418
    1/1 100.64.1.69/32 VRF default flags 0xc10
    1/1 100.64.1.67/32 VRF default flags 0xc10
    1/1 100.64.1.68/32 VRF default flags 0xc10
 2001:db8:4::2 invalid, #paths 5, peer 2001:db8:4::2
  Last update: Fri May 31 11:02:55 2024
  Paths:
    2/1 2001:db8:fffe::2/128 VRF default flags 0x400
    2/1 2001:db8:ffff::2/128 VRF default flags 0x400
    2/1 2001:db8:fffa::2/128 VRF default flags 0x400
    2/1 2001:db8:fffc::2/128 VRF default flags 0x400
    2/1 2001:db8:fffb::2/128 VRF default flags 0x400
 2001:db8:5::2 invalid, #paths 5, peer 2001:db8:5::2
  Last update: Fri May 31 11:02:55 2024
  Paths:
    2/1 2001:db8:fffe::2/128 VRF default flags 0x400
    2/1 2001:db8:ffff::2/128 VRF default flags 0x400
    2/1 2001:db8:fffa::2/128 VRF default flags 0x400
    2/1 2001:db8:fffb::2/128 VRF default flags 0x400
    2/1 2001:db8:fffc::2/128 VRF default flags 0x400
# show ipv6 nht
VRF default:
 Resolve via default: on
2001:db8:4::2
 unresolved
 Client list: bgp(fd 29)
2001:db8:5::2
 unresolved
 Client list: bgp(fd 29)
2001:db8:fffd::2
 resolved via local
 is directly connected, dummy1 (vrf default)
 Client list: bgp(fd 29)

remote ipv6 peer address is should 2001:db8:4::3 and 2001:db8:5::3.

I'm having this issue as well with around ~15 lab servers moving from 9.1 to 10.0. I can provide full configurations and troubleshooting if necessary, but I may move most of them back to 9.1 given the number of issues 10.0 has. It looks like the nexthops themselves are incorrect for some reason. Here's a server on 10.0:

ns01-cs9.dal10.trae32566.org(config)# do show ip ro 192.168.1.0/24
Routing entry for 192.168.1.0/24
  Known via "bgp", distance 200, metric 0, best
  Last update 09:37:49 ago
    192.168.253.6 (recursive), weight 1
  *   192.168.31.1, via bond0, weight 1
    192.168.253.7 (recursive), weight 1
      192.168.31.1, via bond0 (duplicate nexthop removed), weight 1

Here's a server on 9.1, on the same exact subnet (it has a different IP, but a fairly similar configuration otherwise):

sec01-cs9.dal10.trae32566.org# show ip ro 192.168.1.0/24
Routing entry for 192.168.1.0/24
  Known via "bgp", distance 200, metric 0, best
  Last update 11:46:56 ago
    192.168.253.6 (recursive), weight 1
  *   192.168.31.1, via bond0, weight 1
    192.168.253.7 (recursive), weight 1
  *   192.168.31.2, via bond0, weight 1

@ton31337 let me know if you want my configuration or output of anything else. Additionally I wanted to point out that I believe no bgp suppress-duplicate should prevent this behavior even if the nexthops were the same, but it does not appear to work.

Additionally I wanted to point out that I believe no bgp suppress-duplicate should prevent this behavior even if the nexthops were the same, but it does not appear to work.

Suppress duplicates should not influence such a behavior at all, because for outgoing updates only.

@cliff-ha could you describe what the topology looks like in your case? I'm trying to replicate the same locally, but still struggling. How the peers are connected, and how the route originator is connected also.

This is a very simple drawing of how the devices is connected:
Screenshot 2024-06-04 at 09 18 59

But what's in your case is 10.0.9.193, 195.191.143.10, and 195.191.143.22?

@cliff-ha just in case to eliminate one question (can you try if that changes something or not if you disable no ip nht resolve-via-default? Because it seems your routes recursively are resolved via a default gateway, which is via eth0.

195.191.143.10 and 195.191.143.22 router-id of the peering devices.
I actually do not know what the 10.0.9.193 ip address is.

I did try to add no ip nht resolve-via-default and then routes are not installed in the routing table.
This is how it is looking after I added the command.

# show ip route
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, A - Babel, F - PBR, f - OpenFabric,
       t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/100] via 10.16.8.65, eth0, 00:03:52
L>* 10.16.8.66/32 is directly connected, eth0, 00:03:52
L>* 100.64.1.66/32 is directly connected, dummy1, 00:03:52
L>* 100.64.1.245/32 is directly connected, eth2, 00:03:52
L>* 100.64.1.247/32 is directly connected, eth1, 00:03:52
# show bgp ipv4 unicast
BGP table version is 16, local router ID is 100.64.1.66, vrf id 0
Default local pref 100, local AS 48854
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
   i100.64.1.64/32   100.64.1.246             0    100      0 i
   i                 100.64.1.244             0    100      0 i
   i100.64.1.65/32   100.64.1.246             0    100      0 i
   i                 100.64.1.244             0    100      0 i
 *> 100.64.1.66/32   0.0.0.0                  0         32768 i
   i100.64.1.67/32   100.64.1.246                  100      0 i
   i                 100.64.1.244                  100      0 i
   i100.64.1.68/32   100.64.1.246                  100      0 i
   i                 100.64.1.244                  100      0 i
   i100.64.1.69/32   100.64.1.246                  100      0 i
   i                 100.64.1.244                  100      0 i

Displayed 6 routes and 11 total paths
# show bgp ipv4 unicast  100.64.1.68/32
BGP routing table entry for 100.64.1.68/32, version 15
Paths: (2 available, no best path)
  Not advertised to any peer
  Local
    100.64.1.246 (inaccessible, import-check enabled) from 100.64.1.246 (195.191.143.22)
      Origin IGP, localpref 100, invalid, internal
      Originator: 195.191.143.22, Cluster list: 195.191.143.10
      Last update: Tue Jun  4 15:33:47 2024
  Local
    100.64.1.244 (inaccessible, import-check enabled) from 100.64.1.244 (195.191.143.22)
      Origin IGP, localpref 100, invalid, internal
      Originator: 195.191.143.22, Cluster list: 195.191.143.10
      Last update: Tue Jun  4 15:33:47 2024

That's because you don't have anything in the RIB (no connected routes)... Can you add no ip nht resolve-via-default to frr.conf and restart?