Exa-Networks / exabgp

The BGP swiss army knife of networking

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there an option to reduce the time the exabgp will re-try the connection establishment after the peering goes down (default seems to be 60s)

ijukic2003 opened this issue · comments

Describe the bug

Is there an option to reduce the time the exabgp will re-try the connection establishment after the peering goes down (default seems to be 60s). The exabgp establishes the BGP peering, and if that connection goes down (like the peer not reachable anymore), the exabgp will re-try to establish the new connection only once in 60s.
This is not enough when using exabgp to simulate the huge number of peers connecting at the same time.

Does this patch makes things work better for you?

diff --git a/src/exabgp/reactor/peer.py b/src/exabgp/reactor/peer.py
index 2dc5f5c8..7123f879 100644
--- a/src/exabgp/reactor/peer.py
+++ b/src/exabgp/reactor/peer.py
@@ -419,6 +419,7 @@ class Peer(object):
         self.neighbor.rib.outgoing.replace_restart(previous, current)
         self.neighbor.previous = None
 
+        self._delay.reset()
         while not self._teardown:
             # we are here following a configuration change
             if self._neighbor:

I wrote it without any testing at a conference ... so it may not do what it should ..
That said, it looks like we were missing a reset of the exponential backoff delay timer when we successfully established a connection, so it should work.

Hi Thomas,

Sorry for the delay with the testing, still the same, after the connection goes down, the SYN is sent exactly every 60s.

Thank you for the feedback, I will look into it again.

@ijukic2003 can you please tell me how you are performing your test and did you check 4.2 or main branch? (as the change was only applied to main).

Testing by causing a connection drop in the code seems to work as expected with the connection delay timer not increasing anymore when the session can be setup.

I was seeing an increase with every attempt to reconnect but nothing getting to 60s immediately, instead it increased after each failure (up to 60).

Hi Thomas,

Yes, sorry, I see now that after the connection drop, it tries to connect pretty fast, and then the time interval between the retries starts increasing exponentially with every new connection attempt.
The problem I have is that, for the scale tests I am doing, it takes around 4-5 minutes for the network to stabilize after the connection down trigger and the peer becomes ready again to accept the BGP connection. By that time the Exa re-try timer already gets increased back to 60s.
Is there any way to make this a configurable option in the code, so I can set some more aggressive fixed timer?

As far as I know, this behaviour is now fixed on master, if it need backporting let me know