oxidecomputer / omicron

Omicron: Oxide control plane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

panic in sync_switch_configuration.rs with "bgp config is present but announce set is not populated"

sunshowers opened this issue · comments

commented

During today's dogfood mupdate, we found a core dump on gc08 (rsync'd over to /staff/dock/rack2/mupdate-20240329/cores/sled-08/core.oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7.nexus.5125.1711667683).

Based on timestamps, this corresponds to this message in the log file /pool/ext/8a199f12-4f5c-483a-8aca-f97856658a35/crypt/debug/oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7/oxide-nexus:default.log.1711677599:

thread 'tokio-runtime-worker' panicked at nexus/src/app/background/sync_switch_configuration.rs:735:26: 
bgp config is present but announce set is not populated
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Mar 28 23:14:43 Stopping because all processes in service exited. ]
[ Mar 28 23:14:43 Executing stop method (:kill). ]

The assertion is here.

cc @internet-diglett who this code annotates to, and @rcgoodfellow for the nearby TODO.

commented

Note that this corresponded to some network flakiness that was going on around that time (2024-03-28T23:14:41.638474412Z).

I believe CRDB may have been unavailable during this time?

Looks like it, just above the panic I see

23:14:39.337Z ERRO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to collect inventory
    background_task = service_zone_nat_tracker
    error = Service Unavailable: Failed to access DB connection: Timed out in bb8
    file = nexus/src/app/background/sync_service_zone_nat.rs:71
...
23:14:41.465Z WARN 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to read DNS config
    background_task = dns_config_internal
    current_generation = 1
    current_time_created = 2023-08-30 18:59:10.774294 UTC
    dns_group = internal
    error = Service Unavailable: Failed to access DB connection: Timed out in bb8
    file = nexus/src/app/background/dns_config.rs:72

@sunshowers thanks for catching this, a few expects snuck through. Patching this now.