batfish / batfish

Batfish is a network configuration analysis tool that can find bugs and guarantee the correctness of (planned or current) network configurations. It enables network engineers to rapidly and safely evolve their network, without fear of outages or security breaches.

Home Page:http://www.batfish.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Traceroute in batfish says no route but traceroute from the router works fine

bharmarsameer opened this issue · comments

Describe the bug and expected behavior
A clear and concise description of what the bug is and what you expect to happen instead.

Screenshot 2024-03-28 at 3 59 45 PM

In the above topology I am trying to trace from R1's lo0 to R6 lo0. When i login to the R1 directly and run traceroute using source lo0 I see the trace fine but batfish shows no route

R1(config-router-bgp)#traceroute 2.2.2.3 source lo0
traceroute to 2.2.2.3 (2.2.2.3), 30 hops max, 60 byte packets
1 192.168.1.2 (192.168.1.2) 0.088 ms 0.020 ms 0.017 ms
2 2.2.2.3 (2.2.2.3) 1.515 ms 1.922 ms 2.299 ms

Runnable example

import pandas as pd
from pybatfish.client.session import Session
from pybatfish.datamodel import *
from pybatfish.datamodel.answer import *
from pybatfish.datamodel.flow import *
%run startup.py
bf = Session(host="localhost")
# Initialize the example network and snapshot
NETWORK_NAME = "example_network"
BASE_SNAPSHOT_NAME = "base"
SNAPSHOT_PATH = "./snapshot"
bf.set_network(NETWORK_NAME)
bf.init_snapshot(SNAPSHOT_PATH, name=BASE_SNAPSHOT_NAME, overwrite=True)
tr_answer = bf.q.traceroute(startLocation='/R1$/[Loopback0]', headers=HeaderConstraints(dstIps='2.2.2.3/32'), maxTraces=3).answer()
show(tr_answer.frame())

Additional context
Add any other context about the problem here.
R1-config.txt
R2-config.txt
R3-config.txt
R4-config.txt
R5-config.txt
R6-config.txt
SW1-config.txt
Screenshot 2024-03-28 at 4 32 33 PM

If you are using VRRP, you need to be supplying L1 topology. Are you doing that?

You may wish to run the vrrpProperties question to see what it's identified

Same for the switch -- definitely need to supply L1 topology anytime L2 concepts are used!

Looking a bit deeper -- Batfish disables management interfaces by default as the management network usually gets in the way of analysis. I saw that you had that on the switch.

but management interface is not used for any routing. Also I am specifying to use Lo0 in trace.
I just tried shutting down all management interfaces on. still doesnt work. :(

I dug in a little bit, still only shallowly.

In Batfish on my machine with these configs, r4 is not advertising 2.2.2.3/32 to r1 because 2.2.2.3/32 is under RIB failure: it is the best BGP route, but the OSPF route is better.

Screenshot 2024-03-28 at 2 49 50 PM

What is happening in your emulator?

ospf is only for loopback distribution within the AS but outside the as everything is advertised using ebgp. i see the routes being advertised to R1 from R4. Routing looks all fine on the virtual switches itself.

R4#sh ip bgp nei 192.168.1.1 advertised-routes
BGP routing table information for VRF default
Router identifier 2.2.2.1, local AS number 2
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Queued for advertisement
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      2.2.2.1/32             192.168.1.2           -       -          -       -       2 i
 * >      2.2.2.2/32             192.168.1.2           -       -          -       -       2 i
 * >      2.2.2.3/32             192.168.1.2           -       -          -       -       2 i
 * >      2.2.2.100/32           192.168.1.2           -       -          -       -       2 i

Can you add show route for the main RIB?

Also, note that I added ``` around your message so that it rendered correctly, not as markdown :)

show ip route for all the devices or any specific ones?

R4 and R1

R1#sh ip route

VRF: default
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort is not reachable

 C        1.1.0.0/30 is directly connected, Ethernet1/1
 C        1.1.0.4/30 is directly connected, Ethernet2/1
 O        1.1.0.8/30 [110/20] via 1.1.0.2, Ethernet1/1
                              via 1.1.0.6, Ethernet2/1
 C        1.1.1.1/32 is directly connected, Loopback0
 O        1.1.1.2/32 [110/20] via 1.1.0.2, Ethernet1/1
 O        1.1.1.3/32 [110/20] via 1.1.0.6, Ethernet2/1
 C        1.1.1.100/32 is directly connected, Loopback100
 C        1.1.2.0/24 is directly connected, Vlan100
 B E      2.2.2.1/32 [200/0] via 192.168.1.2, Ethernet4/1
 B E      2.2.2.2/32 [200/0] via 192.168.1.2, Ethernet4/1
 B E      2.2.2.3/32 [200/0] via 192.168.1.2, Ethernet4/1
 B E      2.2.2.100/32 [200/0] via 192.168.1.2, Ethernet4/1
 C        192.168.1.0/30 is directly connected, Ethernet4/1

R4#sh ip route

VRF: default
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort:
 S        0.0.0.0/0 [1/0] via 192.168.123.1, Management1

 B E      1.1.1.1/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.1.2/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.1.3/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.1.100/32 [200/0] via 192.168.1.1, Ethernet3/1
 B E      1.1.2.0/24 [200/0] via 192.168.1.1, Ethernet3/1
 C        2.2.0.0/30 is directly connected, Ethernet1/1
 C        2.2.0.4/30 is directly connected, Ethernet2/1
 O        2.2.0.8/30 [110/20] via 2.2.0.2, Ethernet1/1
                              via 2.2.0.6, Ethernet2/1
 C        2.2.2.1/32 is directly connected, Loopback0
 O        2.2.2.2/32 [110/20] via 2.2.0.2, Ethernet1/1
 O        2.2.2.3/32 [110/20] via 2.2.0.6, Ethernet2/1
 C        2.2.2.100/32 is directly connected, Loopback100
 C        192.168.1.0/30 is directly connected, Ethernet3/1
 C        192.168.123.0/24 is directly connected, Management1```

This is very surprising. Can you attach show run all from r4?

Sorry for not being explicit, but can you please include the all? show run does not have the hidden defaults I'm questioning :)

Sorry. attached it as a file.
R4_sh_run_all.txt

So I'll tell you why I'm confused:

  1. R4 has the OSPF 2.2.2.3/32 in its main RIB
  2. R4 has the BGP 2.2.2.3/32 in its BGP RIB (IBGP route learned from R6)
  3. R4 has no bgp advertise-inactive in the show run all.

To me, that says that 2.2.2.3/32 IBGP route should NOT be advertised to R1 -- it's inactive.

But your R1 show data says it is. Can you explain the difference?

havent looked at the bgp advertise inactive. However the routing looks fine because OSPF is only used as IGP and then since R1 and R4 are ebgp neighbors the routes learned by R4 are advertised to R1. in the above comment at no.2 you mean learned from R6 right because R4 learns iBGP route for 2.2.2.3/32 from R6.

  • R4 has the BGP 2.2.2.3/32 in its BGP RIB (IBGP route learned from R6)

In fact we have this configuration everywhere in production with default no bgp advertise-inactive. I have tried to run batfish on prod devices configuration and it just runs fine. I see the expected results.

Yes, I updated my prior comment to say R6.

Can you add show ip bgp for R4?

R4# sh ip bgp
BGP routing table information for VRF default
Router identifier 2.2.2.1, local AS number 2
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      1.1.1.1/32             192.168.1.1           0       -          500     0       1 i
 * >      1.1.1.2/32             192.168.1.1           0       -          500     0       1 i
 * >      1.1.1.3/32             192.168.1.1           0       -          500     0       1 i
 * >      1.1.1.100/32           192.168.1.1           0       -          500     0       1 i
 * >      1.1.2.0/24             192.168.1.1           0       -          500     0       1 i
 * >      2.2.2.1/32             -                     -       -          -       0       i
 * >      2.2.2.2/32             2.2.2.2               0       -          100     0       i
 * >      2.2.2.3/32             2.2.2.3               0       -          100     0       i
 * >      2.2.2.100/32           -                     -       -          -       0       i
 *        2.2.2.100/32           2.2.2.2               0       -          100     0       i

Here you go

I can't understand why 2.2.2.3/32 is considered active in BGP (the >) given that it is not installed in the main RIB. I thought that was the definition of active :).

it is installed in RIB using OSPF right. sh ip route does show that. its a loopback0 ip address of R6.

Right. So here's the EOS documentation I'm referencing (which confirms what's in my head): https://www.arista.com/en/um-eos/eos-border-gateway-protocol-bgp

By default, BGP will advertise only those routes that are active in the switch’s RIB. This can contribute to dropped traffic. If a preferred route is available through another protocol (like OSPF), the BGP route will become inactive and not be advertised; if the preferred route is lost, there is no available route to the affected peers. Advertising inactive BGP routes minimizes traffic loss by providing alternative routes.

The bgp advertise-inactive command causes BGP to advertise inactive routes to BGP neighbors. Inactive route advertisement is configured globally, but the global setting can be overridden on a per-VRF basis.

Note the text I bolded: If a preferred route is available through another protocol (like OSPF), the BGP route will become inactive and not be advertised

right but I think this only applies to iBGP which makes sense because admin distance of ibgp is 200 and ospf is 110. here it is ebgp (admin distance 20) between R1 <> R4. Thats the reason RIB is learning route via OSPF and then it advertises it to its ebgp neighbor which is expected behaviour in this kind of topology.

But we're talking about whether R4 advertises it, not whether R1 does -- right? So on R4 110 < 200 which is why OSPF is in the R4's main rib.

but that is fine. R4 advertises that to R1 which is eBGP not its ibgp neighbors. that whole R4, R5, R6 will learn it via ospf.

On R4, this should be true:

If a preferred route is available through another protocol (like OSPF), the BGP route will become inactive and not be advertised

So it should not advertise the route to R1.

As far as I can tell, this is not specific to IBGP routes or EBGP routes, or the remote-as of the neighbor

hmmmm AFAIK this is the standard design. and the fact that on the actual switch it shows the same. All trace / pings work as expected from the switch. If you want to advertise 2.2.2.3/32 outside the bgp domain, EBGP is the option to go. Thats what is happening here. all loopbacks are learned via ospf within the ospf domain and then using ebgp those loopbacks will be advertised to the ebgp neighbors. same happens from R5 > R2 as well. Take a look below

R5#sh ip bgp nei 192.168.2.1 advertised-routes 
BGP routing table information for VRF default
Router identifier 2.2.2.2, local AS number 2
Route status codes: s - suppressed contributor, * - valid, > - active, E - ECMP head, e - ECMP
                    S - Stale, c - Contributing to ECMP, b - backup, L - labeled-unicast, q - Queued for advertisement
                    % - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI Origin Validation codes: V - valid, I - invalid, U - unknown
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  AIGP       LocPref Weight  Path
 * >      1.1.1.1/32             192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.1.2/32             192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.1.3/32             192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.1.100/32           192.168.2.2           -       -          -       -       2 1 i
 * >      1.1.2.0/24             192.168.2.2           -       -          -       -       2 1 i
 * >      2.2.2.1/32             192.168.2.2           -       -          -       -       2 i
 * >      2.2.2.2/32             192.168.2.2           -       -          -       -       2 i
 * >      2.2.2.3/32             192.168.2.2           -       -          -       -       2 i
 * >      2.2.2.100/32           192.168.2.2           -       -          -       -       2 i
R5#sh ip route 

VRF: default
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

Gateway of last resort:
 S        0.0.0.0/0 [1/0] via 192.168.123.1, Management1

 B I      1.1.1.1/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.1.2/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.1.3/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.1.100/32 [200/0] via 2.2.0.1, Ethernet1/1
 B I      1.1.2.0/24 [200/0] via 2.2.0.1, Ethernet1/1
 C        2.2.0.0/30 is directly connected, Ethernet1/1
 O        2.2.0.4/30 [110/20] via 2.2.0.1, Ethernet1/1
                              via 2.2.0.10, Ethernet2/1
 C        2.2.0.8/30 is directly connected, Ethernet2/1
 O        2.2.2.1/32 [110/20] via 2.2.0.1, Ethernet1/1
 C        2.2.2.2/32 is directly connected, Loopback0
 O        2.2.2.3/32 [110/20] via 2.2.0.10, Ethernet2/1
 C        2.2.2.100/32 is directly connected, Loopback100
 C        192.168.2.0/30 is directly connected, Ethernet3/1
 C        192.168.123.0/24 is directly connected, Management1