panic in sync_switch_configuration.rs with "bgp config is present but announce set is not populated"
sunshowers opened this issue · comments
During today's dogfood mupdate, we found a core dump on gc08 (rsync'd over to /staff/dock/rack2/mupdate-20240329/cores/sled-08/core.oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7.nexus.5125.1711667683
).
Based on timestamps, this corresponds to this message in the log file /pool/ext/8a199f12-4f5c-483a-8aca-f97856658a35/crypt/debug/oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7/oxide-nexus:default.log.1711677599
:
thread 'tokio-runtime-worker' panicked at nexus/src/app/background/sync_switch_configuration.rs:735:26:
bgp config is present but announce set is not populated
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Mar 28 23:14:43 Stopping because all processes in service exited. ]
[ Mar 28 23:14:43 Executing stop method (:kill). ]
The assertion is here.
cc @internet-diglett who this code annotates to, and @rcgoodfellow for the nearby TODO.
Note that this corresponded to some network flakiness that was going on around that time (2024-03-28T23:14:41.638474412Z).
I believe CRDB may have been unavailable during this time?
Looks like it, just above the panic I see
23:14:39.337Z ERRO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to collect inventory
background_task = service_zone_nat_tracker
error = Service Unavailable: Failed to access DB connection: Timed out in bb8
file = nexus/src/app/background/sync_service_zone_nat.rs:71
...
23:14:41.465Z WARN 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to read DNS config
background_task = dns_config_internal
current_generation = 1
current_time_created = 2023-08-30 18:59:10.774294 UTC
dns_group = internal
error = Service Unavailable: Failed to access DB connection: Timed out in bb8
file = nexus/src/app/background/dns_config.rs:72
@sunshowers thanks for catching this, a few expects
snuck through. Patching this now.