FRRouting / frr

The FRRouting Protocol Suite

Home Page:https://frrouting.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wierd BGP IPv6 ll nh behavior

qeleq opened this issue · comments

Hi All!.
FRR version 10.0.

I have two interfaces with ipv6 ll addresses and EBGP IPv6 sessions

7: ens13f0np0.80@ens13f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 18:9b:a5:82:25:e2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80:14:fc01:1::2/64 scope link 
       valid_lft forever preferred_lft forever
    inet6 fe80::1a9b:a5ff:fe82:25e2/64 scope link 
       valid_lft forever preferred_lft forever
10: ens28f0np0.80@ens28f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e8:eb:d3:b3:54:b6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80:14:fc01:2::2/64 scope link 
       valid_lft forever preferred_lft forever
    inet6 fe80::eaeb:d3ff:feb3:54b6/64 scope link 
       valid_lft forever preferred_lft forever

FRR settings

_frr version 10.0
frr defaults traditional
hostname el-fw1.cdnwb.ru
log syslog informational
service integrated-vtysh-config

router bgp 65323
neighbor SW-LAN peer-group
 neighbor fe80:14:fc01:1::1 peer-group SW-LAN
 neighbor fe80:14:fc01:1::1 interface ens13f0np0.80
 no neighbor fe80:14:fc01:1::1 enforce-first-as
 neighbor fe80:14:fc01:2::1 peer-group SW-LAN
 neighbor fe80:14:fc01:2::1 interface ens28f0np0.80
 no neighbor fe80:14:fc01:2::1 enforce-first-as
 address-family ipv6 unicast
  neighbor SW-LAN activate
  neighbor SW-LAN soft-reconfiguration inbound
  neighbor SW-LAN route-map FROM_LAN_V6 in
  neighbor SW-LAN route-map TO_LAN_V6 out
 exit-address-family_

All sessions are UP and stable

_Neighbor          V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
fe80:14:fc01:1::1 4      65322      2400      2170       11    0    0 16:27:31            1        0 N/A
fe80:14:fc01:2::1 4      65322      2362      2142       11    0    0 16:27:31            1        0 N/A_

Both BGP peer announce me one IPv6 prefix, 2a03:720:1000::/36

el-fw1.cdnwb.ru# sh bgp neighbors fe80:14:fc01:1::1 received-routes

_BGP table version is 11, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 65323
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

  Network          Next Hop            Metric LocPrf Weight Path
 *> 2a03:720:1000::/36
                    fe80:14:fc01:1::1
                                                           0 65322 4206000170 57073 i
Total number of prefixes 1_

el-fw1.cdnwb.ru# sh bgp neighbors fe80:14:fc01:2::1 received-routes

BGP table version is 11, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 65323
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

  Network          Next Hop            Metric LocPrf Weight Path
 *> 2a03:720:1000::/36
                    fe80:14:fc01:2::1
                                                           0 65322 4206000170 57073 i

Total number of prefixes 1

So, BGP signaling is ok, but i have very weird situation for adding routes to RIB. So

_el-fw1.cdnwb.ru# sh bgp neighbors fe80:14:fc01:1::1 received-routes detail

BGP table version is 11, local router ID is 192.168.0.1, vrf id 0
Default local pref 100, local AS 65323
BGP routing table entry for 2a03:720:1000::/36, version 11
Paths: (2 available, best #1, table default)
  Not advertised to any peer
  65322 4206000170 57073
    **fe80:14:fc01:2::1** from **fe80:14:fc01:2::1** (10.255.193.111)
    (fe80:14:fc01:2::1) (used)
      Origin IGP, valid, external, best (First path received)
      Last update: Mon May 27 17:45:41 2024
  65322 4206000170 57073
    **fe80:14:fc01:1::1** (inaccessible, import-check enabled) from **fe80:14:fc01:1::1** (10.255.193.110)
    (fe80:14:fc01:1::1) (used)
      Origin IGP, invalid, external
      Last update: Mon May 27 17:45:41 2024

Total number of prefixes 1_

Question number 1 why route from peer fe80:14:fc01:2::1 is shown as route from peer fe80:14:fc01:1::1
And the second question is probably related to the first, i have a big problem with installing route to the RIB. Some time i have both routes

_B>* 2a03:720:1000::/36 [20/0] via **fe80:14:fc01:1::1,** ens13f0np0.80, weight 1, 00:11:59
**                via **fe80:14:fc01:2::1**, ens28f0np0.80, weight 1, 00:11:59_

Sometimes one

_B>* 2a03:720:1000::/36 [20/0] via fe80:14:fc01:2::1, ens28f0np0.80, weight 1, 16:38:59_

Some times none :-(

Help me please.

Can you enable debug bgp updates, debug bgp neighbor, debug bgp nht and then send us the logs?

Also, just in case the following commands outputs would be handy too:

show ipv6 nht
show bgp nexthop
show bgp import-check-table

Done

VRF default:
 Resolve via default: on
fe80:14:fc01:1::1(Connected)
 resolved via connected
 is directly connected, ens13f0np0.80 (vrf default)
 Client list: bgp(fd 18)
fe80:14:fc01:2::1(Connected)
 resolved via connected
 is directly connected, ens28f0np0.80 (vrf default)
 Client list: bgp(fd 18)
el-fw1.cdnwb.ru# show bgp nexthop
Current BGP nexthop cache:
 fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1
  if ens13f0np0.80
  Last update: Mon May 27 16:26:58 2024
 fe80:14:fc01:2::1 valid [IGP metric 0], #paths 1, peer fe80:14:fc01:2::1
  if ens28f0np0.80
  Last update: Mon May 27 16:35:25 2024
 fe80:14:fc01:1::1 invalid, #paths 1
  Must be Connected
  Last update: Wed May 22 17:20:29 2024
el-fw1.cdnwb.ru# show bgp import-check-table
Current BGP import check cache:
el-fw1.cdnwb.ru#_ 

debug.txt

You have something strange in next-hop cache:

 fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1
  if ens13f0np0.80
  Last update: Mon May 27 16:26:58 2024

 fe80:14:fc01:1::1 invalid, #paths 1
  Must be Connected
  Last update: Wed May 22 17:20:29 2024

Two entries for the same next-hop, but one is invalid. And the last update is way older. Does this happens (bad behavior) even when the router is restarted? Or is that starting to happen after some time?

I dont know its related or not.
I have similar issue like this after restore config from 9.1 to 10.0 (which is enforce-first-as as default). Triggering command with no neighbor XXX enforce-first-as bring still showing weird low number of received-routes. Clear ip bgp also not works until solved by neighbor XXX shutdown and no shutdown.

So command no neighbor XXX enforce-first-as need shut and no shut the peer then the command will aplied.

You have something strange in next-hop cache:

 fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1
  if ens13f0np0.80
  Last update: Mon May 27 16:26:58 2024

 fe80:14:fc01:1::1 invalid, #paths 1
  Must be Connected
  Last update: Wed May 22 17:20:29 2024

Two entries for the same next-hop, but one is invalid. And the last update is way older. Does this happens (bad behavior) even when the router is restarted? Or is that starting to happen after some time?

It's a new router with new ipv6 design. A have got this problem just after the frr and host configurations were completed. There was one period when everything was working, about 15 minutes. It seems to me that after the restart FRR the situation may change. Both nexthops can become invalid, for example, or both can work, anything is possible. By the way, now nh table is
el-fw1.cdnwb.ru# sh bgp nexthop
Current BGP nexthop cache:
fe80:14:fc01:1::1 valid [IGP metric 0], #paths 0, peer fe80:14:fc01:1::1
if ens13f0np0.80
Last update: Mon May 27 16:26:58 2024
fe80:14:fc01:2::1 valid [IGP metric 0], #paths 1, peer fe80:14:fc01:2::1
if ens28f0np0.80
Last update: Mon May 27 16:35:25 2024
fe80:14:fc01:1::1 invalid, #paths 1
Must be Connected
Last update: Wed May 22 17:20:29 2024
el-fw1.cdnwb.ru#

I dont know its related or not. I have similar issue like this after restore config from 9.1 to 10.0 (which is enforce-first-as as default). Triggering command with no neighbor XXX enforce-first-as bring still showing weird low number of received-routes. Clear ip bgp also not works until solved by neighbor XXX shutdown and no shutdown.

So command no neighbor XXX enforce-first-as need shut and no shut the peer then the command will aplied.

Sorry, it didn't help me