FRRouting / frr

Description

We already discussed this in FRR general chat, filing the issue as agreed.

Problem Description

Our testing has shown that FRR does not follow the RFC when it comes to respecting the next-hop unchanged status for ipv6 eBGP sessions.

In essence what we have seen across the versions (including v10) is the following:
[RR-client-VM01] ----> [FRR RR] ---> [GW router]

In this setup RR client is sending the IPv6 prefix A/128 w/ a next-hop of that IPv6 address of the RR-client-VM01.
FRR then takes that prefix and announces it via eBGP but sends both: link-local (of the FRR RR peer itself - which is not following RFC) and a proper IPv6 global IPv6 address as a next-hops. We do use next-hop unchanged, and still FRR imposes its own Link-Local ipv6 address, which breaks the routing.

If we take a look at the RFC for a route-reflector, although no specific mention of IPv6, link-local or global addressing, but it does state:

In addition, when a RR reflects a route, it SHOULD NOT modify the following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED. Their modification could potentially result in routing loops.
https://datatracker.ietf.org/doc/html/rfc4456#section-10

And in our case, when one does tpcdump we still see the link-local ipv6 in the announcements in the next-hop, which is in this case erroneous.

This behavior is not seen in other implementations like Cisco/Juniper/BIRD and co...

We did see: #10009 and couple of more issues filed in GitHub around this, but the result seems always the same: people just get told to try to enforce the peer to prefer global next-hop.

Unfortunately the peers we are dealing with (GW router) do not have an option to set set ipv6 next-hop prefer-global in a route map, which actually breaks our entire IPv6 deployment and makes FRR not usable in this use-case for ipv6 load…

Potential portion of the code that does not seem right

It seems to be this particular check that is causing the LL NH to be added:

frr/bgpd/bgp_route.c

Line 2485 in d5b0c76

if ((CHECK_FLAG(peer->af_flags[afi][safi],

w/o this portion (that erroneous modifies the NH and adds LL), we get properly formed announcement, and RR does not impose its LL on top of the global.

Thanks & best regards.

Version

We tested 8/9/10 - all behave the same

How to reproduce

We can provide the configurations (cleaned up), but in essence:

One follows the topology described above: RR ipv6 peers w/ RR client; RR client sends a prefix w/ a next-hop of RR client itself (iBGP)
RR FRR peers w/ GW (can be simulated by another FRR) - via eBGP w/ next-hop unchanged

Observe tcpdump and announcements from FRR RR towards GW FRR - where one will notice that RR FRR imposes its link-local address by mistake towards ebgp peer.

Expected behavior

FRR RR does not try to get in the way and respects RR RFC and does not modify in any way NEXT_HOP attribute.

Actual behavior

NEXT_HOP as:

Proper global ipv6 address (expected)
FRR routers link-local (not expected)

Additional context

No response

Checklist

I have searched the open issues for this bug.
I have not included sensitive information in this report.

I was doing some experiments and captures and wanted to provide the following to help debug further ...

The following diffs will change the behaviour and prevent the link-local address getting added

diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index 94c21e186..3785643e6 100644
--- a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -2487,6 +2487,7 @@ bool subgroup_announce_check(struct bgp_dest *dest, struct bgp_path_info *pi,
                     && IN6_IS_ADDR_LINKLOCAL(&attr->mp_nexthop_local))
                    || (!reflect && !transparent
                        && IN6_IS_ADDR_LINKLOCAL(&peer->nexthop.v6_local)
+                       && IN6_IS_ADDR_LINKLOCAL(&attr->mp_nexthop_local)
                        && peer->shared_network
                        && (from == bgp->peer_self
                            || peer->sort == BGP_PEER_EBGP))) {

In our topology, the iBGP RR client does NOT send a link-local in the next-hop
Our RR eBPG neighbor IS NOT configured with link-local peering
- what is &peer->nexthop.v6_local?
Our RR eBPG neighbor IS on a shared_network
Our RR eBPG neighbor IS NOT an RR client or an RS client
I am not sure what the final test is doing && (from == bgp->peer_self || peer->sort == BGP_PEER_EBGP))) but I guess it is evaluating TRUE

Another query is that I don't see part of the code actually adding the RR link-local to the next-hop, only setting:

attr->mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL;

... so I guess some other part of the code is filling in the blank here?

frr/bgpd/bgp_updgrp_packet.c

Line 545 in b436e96

mod_v6nhl = &peer->nexthop.v6_local;

Finally, I wonder about this statement in the RFC and the observed behaviour:

The link-local address shall be included in the Next Hop field if and
only if the BGP speaker shares a common subnet with the entity
identified by the global IPv6 address carried in the Network Address
of Next Hop field and the peer the route is being advertised to.

Which link-local should be included, the originator of the route (true next-hop) or the intermediary (client)--ibgp-->(RR)--ebgp-->(DCR) where they all share the same subnet.

In addition, when a RR reflects a route, it SHOULD NOT modify the following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED. Their modification could potentially result in routing loops.

But it doesn't say anything about MP_REACH_NLRI attribute. NEXT_HOP is IPv4 only.

Our testing has shown that FRR does not follow the RFC when it comes to respecting the next-hop unchanged status for ipv6 eBGP sessions.

Did you use nexthop-local unchanged or attribute-unchanged next-hop?

@ton31337 - I did some experimentation with those flags but in our scenario they had no effect. I was able to isolate the code where I believe our issue is triggered (see #16200). I think there is an assumption in FFR code that the peer originating the IPv6 prefix, on the shared network, has actually added its link-local as a next-hop. In our case, it hasn't which results in FRR inserting it's own link-local which is not what we want.

BIRD on the other hand handles this case more gracefully but we'd really like to avoid having to move all our RRs to BIRD :-(

--- update ---

Ok #16200 clearly breaks many other cases, but I think it illustrates our problem at least.

I did some experimentation with those flags but in our scenario they had no effect.

You say it's not working, but bcf094e says it's (PEER_FLAG_NEXTHOP_LOCAL_UNCHANGED) working, or I missed something?

I did some experimentation with those flags but in our scenario they had no effect.

You say it's not working, but bcf094e says it's (PEER_FLAG_NEXTHOP_LOCAL_UNCHANGED) working, or I missed something?

The check for PEER_FLAG_NEXTHOP_LOCAL_UNCHANGED is an AND with IN6_IS_ADDR_LINKLOCAL(&attr->mp_nexthop_local)), which doesn't apply in our case since IN6_IS_ADDR_LINKLOCAL(&attr->mp_nexthop_local)) is NULL.

Also this code expects to PRESERVE the link-local but in our case it does not exist. So we hit the second (OR) test, since we are on a common network.

So now, even in the absence of a link-local next-hop, we do set:

			else
				attr->mp_nexthop_len =
					BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL;

this later causes the update formation code to insert the wrong link-local address.

For me it makes sense adding this check (attr->mp_nexthop_local) if PEER_FLAG_NEXTHOP_LOCAL_UNCHANGED is also configured...

/cc @riw777

Is NEXTHOP_LOCAL_UNCHANGED not implicit when the peers are on the same network segment?

If you are asking if this flag is set automatically when the same network segment - then no.