Job scheduling for agentx in case of trap burst leads to bgpd freeze
fdumontet6WIND opened this issue · comments
Description
agentx protocol for trap notification is based on notification message/response.
NOTIFICATION
snmpd <----------------------- agentX/bgpd
socket A
RESPONSE
snmpd -----------------------> agentX/bgpd
socket B
Response is immediatly sent at "Notification" reception by snmpd.
agentX/bgpd reading is made by a dedicated thread function "agentx_read"
When a burst of neighbors creation/deletion occurs "agentx_read" function is not called due to multiple thead with higher
priority.
In that case a burst of trap notifications is send to snmpd through socket A.
snmpd immediatly forward this burst into response to agentx/bgpd through socket B.
Since "agentx_read" is not called, socket B writing buffer is overloaded and snmpd writing is blocked ( snmpd is using a blocking write call).
agentx is
Thus now snmpd stop to read Socket A notifications. and consecutivly socket A writing buffer is overloaded and agentX is blocked
(lib snmp is using a blocking write call).
Finally bgpd is fully blocked and freeze
Version
Hello, this is FRRouting (version 10.0 (frr-10.0-878-gde5898c37e)).
All frr
Aff net-snmp
How to reproduce
allow trap for bgpd
having a great number of neighbours.
making creation/ delation of theses neighbors.
Expected behavior
bgpd coutinue to work
Actual behavior
bgpd freeze, no crash. system is blocked
Additional context
Occurs on site
Checklist
- I have searched the open issues for this bug.
- I have not included sensitive information in this report.