FRRouting / frr

The FRRouting Protocol Suite

Home Page:https://frrouting.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fails to create VRF-bound BGP instance with different ASN than default, if VRF has an L3VNI

toreanderson opened this issue · comments

Description

When trying to create a VRF-bound BGP instance that uses a different ASN than the default BGP instance, FRR will refuse to do so with the error message BGP is already running; AS is X, where X is the ASN of the default BGP instance.

This only happens if the VRF has a L3VNI.

Version

FRRouting 9.1 (xps13) on Linux(6.8.9-300.fc40.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--runstatedir=/run' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/libexec/frr' '--sysconfdir=/etc/frr' '--libdir=/usr/lib64/frr' '--libexecdir=/usr/libexec/frr' '--localstatedir=/run/frr' '--enable-multipath=64' '--enable-vtysh=yes' '--disable-ospfclient' '--disable-ospfapi' '--enable-snmp=agentx' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-rtadv' '--disable-exampledir' '--enable-systemd=yes' '--enable-static=no' '--disable-ldpd' '--disable-babeld' '--with-moduledir=/usr/lib64/frr/modules' '--with-crypto=openssl' '--enable-fpm' '--enable-grpc' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'

How to reproduce

Starting with an unconfigured FRR instance already running, issue the following commands:

ip link add up name vrf100 type vrf table 100
ip link add up name br100 master vrf100 type bridge
ip link add up vni10100 type vxlan id 10100
ip link set vni10100 master br100

vtysh <<EOF
configure

vrf vrf100
 vni 10100
exit-vrf

router bgp 50
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit

router bgp 100 vrf vrf100
exit
EOF

Expected behavior

The commands should complete without issue.

(Note that the FRR documentation makes it clear that using different ASNs in different VRFs is supposed to work.)

Actual behavior

The VRF-bound BGP instance is not created. The script fails with the following output:

xps13(config)# router bgp 100 vrf vrf100
BGP is already running; AS is 50

Additional context

If the L3VNI is bound to the VRF after the BGP instance is created, it works. In other words, after changing the script as follows, the configuration successfully loads:

configure

router bgp 50
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit

router bgp 100 vrf vrf100
exit

vrf vrf100
 vni 10100
exit-vrf

However, if this configuration is made persistent with write, the vrf section is located above the router bgp sections in the generated configuration file, causing it to not load correctly when FRR (re)starts.

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.

Related #9537?

It seems to like the conditions for that issue is different. In particular, there is no EVPN/L3VNI in that configuration, which is a requirement for the bug to trigger in mine. That said, it could of course be that there is a single root cause at play here that can be triggered in multiple ways.

Strange. Just double-checking, is the L3VNI visible to FRR for you prior to the attempted creation of the BGP instance?

xps13# show evpn vni 
VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF                           
10100      L3   vni10100              0        0        n/a             vrf100                               
xps13# configure 
xps13(config)# router bgp 100 vrf vrf100
BGP is already running; AS is 50

Not sure if it matters, but I'm running kernel 6.8.9-300.fc40.x86_64 (Fedora 40).

Yes, pardon, master is affected also. Will check what's going on, and let you know.

Overall yes, this is technically the same issue as #9537.

TL;DR; When we configure advertise-all-vni (in this case), a new BGP instance is created with the name vrf100, and ASN 50. Next, when we create router bgp 100 vrf vrf100, we look for the BGP instance with the same name and we found it, but ASNs are different 50 vs. 100.

@toreanderson are you able to test a patch (compile)?

@toreanderson are you able to test a patch (compile)?

Assuming the build process is relatively straight forward (or well documented if not), certainly.

The patch is here #16159, you could also wait for the artifacts to be compiled and install .deb, .rpm if CI passes of course.

The patch is here #16159, you could also wait for the artifacts to be compiled and install .deb, .rpm if CI passes of course.

Tested build artifacts on Debian 12:

  • frr_10.1-dev-master-ga24c805-20240604.084942-1~deb12u1_amd64.deb
  • frr_10.1-dev-PR16159-g755fea3-20240604.123658-1~deb12u1_amd64.deb

I can reprocue the issue on the former, but not on the latter. LGTM! 👌