EVPN type-2 MACIP not imported into other VRFs by originating node, only at receivers
toreanderson opened this issue · comments
Description
I am not 100% certain whether or not this is a bug, or expected behaviour. In any case, it leads to suboptimal routing, so if it is not a bug, it could be considered a feature request at least.
When importing routes from an EVPN VRF into another VRF (such as the default VRF), type-2 MACIP routes (containing an IP address) do not get imported into the target VRF on the node that originated the type-2 MACIP route (i.e., the one that has the MAC/IP in the type-2 route locally attached). It is however imported on all other nodes.
This leads to suboptimal routing as traffic the traffic from the external network is always directed to a node where the MAC/IP destination is not present. From there it will be encapsulated in VXLAN and sent to the target node.
It would be better if also the node where the MAC/IP is present could also import the route into the target VRF and re-advertise it there, as that path would be preferred by the external network (due to a shorter AS path)
Version
FRRouting 10.1-dev (frrtest) on Linux(6.1.0-21-amd64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--enable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' '--enable-sharpd' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
(from frr_10.1-dev-master-ga24c805-20240604.084942-1~deb12u1_amd64.deb)
How to reproduce
To illustrate, I've set up a lab with three Debian 12 nodes connected in a triangle (full mesh between interfaces ens7
and ens8
on each node).
One of the nodes, frrtest
, does not participate in EVPN - it represents the external network.
The other two nodes, frrtest2
and frrtest3
, represents two EVPN routers with ASN 2 and 3, with a single L3VNI (10) bound to VRF 10, and an IRB on L2VNI 100. Static MAC/ARP entries are used to generate type-2 routes for (192.168.0.2 and 192.168.0.3) are used to represent downstream hosts on the L2VNI. The default VRF imports routes from VRF 10.
These are the scripts I use to configure the three nodes from scratch:
frrtest1
vtysh <<EOF
configure
interface lo
ip address 10.0.0.1/32
interface ens7
no shutdown
interface ens8
no shutdown
router bgp 1
no bgp ebgp-requires-policy
neighbor ens7 interface remote-as external
neighbor ens8 interface remote-as external
address-family ipv4 unicast
network 10.0.0.1/32
neighbor ens7 activate
neighbor ens8 activate
exit-address-family
EOF
frrtest2 and frrtest3
# ID=1 on frrtest1
# ID=2 on frrtest2
ID=${HOSTNAME#frrtest}
vtysh <<EOF
configure
interface lo
ip address 10.0.0.$ID/32
interface ens7
no shutdown
interface ens8
no shutdown
vrf vrf10
vni 10
router bgp $ID
no bgp ebgp-requires-policy
neighbor ens7 interface remote-as external
neighbor ens8 interface remote-as external
address-family ipv4 unicast
network 10.0.0.$ID/32
neighbor ens7 activate
neighbor ens8 activate
import vrf vrf10
exit-address-family
address-family l2vpn evpn
advertise-all-vni
neighbor ens8 activate
exit-address-family
router bgp $ID vrf vrf10
address-family ipv4 unicast
redistribute connected
exit-address-family
address-family l2vpn evpn
advertise ipv4 unicast
exit-address-family
EOF
# L3VNI setup
ip link add up vrf10 type vrf table 10
ip link add up br10 type bridge
ip link set br10 master vrf10
ip link add up vni10 type vxlan id 10 local 10.0.0.$ID nolearning dstport 4789
ip link set vni10 master br10
bridge link set dev vni10 learning off
# L2VNI and IRB setup
ip link add up br100 type bridge
ip link set br100 master vrf10
ip link add up vni100 type vxlan id 100 local 10.0.0.$ID nolearning dstport 4789
ip link set vni100 master br100
bridge link set dev vni100 learning off
ip address add 192.168.0.1/24 dev br100
# Mock host setup (to cause type-2 MACIP advertisement)
ip link add up dummy0 type dummy
ip link set dummy0 master br100
bridge fdb add 02:00:00:00:00:$ID$ID dev dummy0 master static sticky
ip neigh add 192.168.0.$ID lladdr 02:00:00:00:00:$ID$ID dev br100
Expected behavior
frrtest should see a direct route to 192.168.0.2 via frrtest2, and a direct route to 192.168.0.3 via frrtest3.
Both frrtest2 and frrtest3 should see routes to 192.168.0.2 and 192.168.0.3 in the default VRF with an AS-path length of null for the locally generated route, and one for the route received from the other EVPN node.
Actual behavior
frrtest only sees a route to 192.168.0.2 via frrtest3 (as-path 3 2) and t 192.168.0.3 via frtest2 (as-path 2 3):
frrtest# show ip bgp 192.168.0.2
BGP routing table entry for 192.168.0.2/32, version 5
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
ens7 ens8
3 2
::ffff:a00:3 from ens8 (10.0.0.3)
(fe80::f816:3eff:fe87:37c2) (used)
Origin IGP, valid, external, best (First path received)
Extended Community: ET:8 Rmac:0e:cd:2f:40:19:54
Last update: Tue Jun 4 14:27:28 2024
frrtest# show ip bgp 192.168.0.3
BGP routing table entry for 192.168.0.3/32, version 6
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
ens7 ens8
2 3
::ffff:a00:2 from ens7 (10.0.0.2)
(fe80::f816:3eff:fec0:6702) (used)
Origin IGP, valid, external, best (First path received)
Extended Community: ET:8 Rmac:86:42:e3:bc:7c:91
Last update: Tue Jun 4 14:27:28 2024
This is also visible on frrtest2 and frrtest3, which only has routes in the default VRF to the remote MACIP route, not to its own:
frrtest2# show ip route vrf default 192.168.0.0/24 longer-prefixes
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
B>* 192.168.0.0/24 [20/0] is directly connected, vrf10 (vrf vrf10), weight 1, 01:19:45
B>* 192.168.0.3/32 [20/0] via 10.0.0.3, br10 (vrf vrf10) onlink, weight 1, 01:19:36
frrtest3# show ip route vrf default 192.168.0.0/24 longer-prefixes
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
B>* 192.168.0.0/24 [20/0] is directly connected, vrf10 (vrf vrf10), weight 1, 01:19:47
B>* 192.168.0.2/32 [20/0] via 10.0.0.2, br10 (vrf vrf10) onlink, weight 1, 01:19:45
Additional context
Both frrtest2 and frrtest3 see their local and the remote MACIP route with the expected as-path lengths (null for the locally generated type-2, one AS for the remotely generated one):
frrtest2
frrtest2# show bgp l2vpn evpn route type 2
BGP table version is 3, local router ID is 10.0.0.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 10.0.0.2:3
*> [2]:[0]:[48]:[02:00:00:00:00:22]
10.0.0.2 32768 i
ET:8 RT:2:100 MM:0, sticky MAC
*> [2]:[0]:[48]:[02:00:00:00:00:22]:[32]:[192.168.0.2]
10.0.0.2 32768 i
ET:8 RT:2:100 RT:2:10 Rmac:0e:cd:2f:40:19:54
Route Distinguisher: 10.0.0.3:3
*> [2]:[0]:[48]:[02:00:00:00:00:33]
10.0.0.3 0 3 i
RT:3:100 ET:8 MM:0, sticky MAC
*> [2]:[0]:[48]:[02:00:00:00:00:33]:[32]:[192.168.0.3]
10.0.0.3 0 3 i
RT:3:10 RT:3:100 ET:8 Rmac:86:42:e3:bc:7c:91
Displayed 4 prefixes (4 paths) (of requested type)
frrtest3
frrtest3# show bgp l2vpn evpn route type 2
BGP table version is 3, local router ID is 10.0.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 10.0.0.2:3
*> [2]:[0]:[48]:[02:00:00:00:00:22]
10.0.0.2 0 2 i
RT:2:100 ET:8 MM:0, sticky MAC
*> [2]:[0]:[48]:[02:00:00:00:00:22]:[32]:[192.168.0.2]
10.0.0.2 0 2 i
RT:2:10 RT:2:100 ET:8 Rmac:0e:cd:2f:40:19:54
Route Distinguisher: 10.0.0.3:3
*> [2]:[0]:[48]:[02:00:00:00:00:33]
10.0.0.3 32768 i
ET:8 RT:3:100 MM:0, sticky MAC
*> [2]:[0]:[48]:[02:00:00:00:00:33]:[32]:[192.168.0.3]
10.0.0.3 32768 i
ET:8 RT:3:100 RT:3:10 Rmac:86:42:e3:bc:7c:91
Displayed 4 prefixes (4 paths) (of requested type)
I would be happy to give any interested developer access to this lab (including full sudo access) in case that is of interest.
Checklist
- I have searched the open issues for this bug.
- I have not included sensitive information in this report.