π BUG: Unable to reconnect after server crash
Cyberes opened this issue Β· comments
What version of nebula
are you using? (nebula -version
)
1.8.2
What operating system are you using?
Linux
Describe the Bug
One of my servers has been unstable and will sometimes become unresponsive, requiring a hard reset (this is host A, 172.0.3.109
). When it comes back up, it's unable to establish a connection to my node that is a nginx reverse proxy (this is host B, 172.0.2.120
). If I manually restart the Nebula service on host A, it fixes the issue and host A can connect to host B.
Logs from affected hosts
Lighthouse
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:33 cloud-proxy nebula[689]: time="2024-04-22T12:13:33-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:33 cloud-proxy nebula[689]: time="2024-04-22T12:13:33-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:34 cloud-proxy nebula[689]: time="2024-04-22T12:13:34-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:34 cloud-proxy nebula[689]: time="2024-04-22T12:13:34-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Host A
Apr 22 12:12:25 host-a nebula[811]: time="2024-04-22T12:12:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Tunnel status" certName=host-b localIndex=804429705 remoteIndex=3094684430 tunnelCheck="map[method:active state:dead]" v>
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=2480024129 rela>
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Taking new handshake" certName=host-b localIndex=2480024129 remoteIndex=108412485 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=817659693 relay>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Tunnel status" certName=host-b localIndex=1362300847 remoteIndex=2069131475 tunnelCheck="map[method:active state:dead]" >
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=2903481744 rela>
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Taking new handshake" certName=host-b localIndex=2903481744 remoteIndex=3824716788 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
This pattern repeats.
Host B
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Attempt to relay through hosts" localIndex=1813799298 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=1>
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:26 host-b nebula[111]: time="2024-04-22T12:06:26-00:00" level=info msg="Handshake timed out" durationNs=6152868332 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1813799298 localIndex=1813799>
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3495069400 localIndex=3495069400 remoteIndex=0 u>
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Attempt to relay through hosts" localIndex=3495069400 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=1>
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Send handshake via relay" localIndex=3495069400 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Send handshake via relay" localIndex=3495069400 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109
This pattern repeats
Config files from affected hosts
Host A
firewall:
conntrack:
default_timeout: 10m
max_connections: 100000
tcp_timeout: 12m
udp_timeout: 3m
inbound:
- host: any
port: any
proto: icmp
- host: any
port: any
proto: any
outbound:
- host: any
port: any
proto: any
lighthouse:
am_lighthouse: false
hosts:
- 172.0.0.2
- 172.0.0.3
interval: 60
listen:
batch: 128
host: 0.0.0.0
port: 0
read_buffer: 10485760
write_buffer: 10485760
logging:
format: text
level: info
pki:
ca: |
cert: |
key: |
punchy:
delay: 1s
punch: true
punch_back: true
respond: true
relay:
am_relay: false
relays:
- 172.0.0.2
- 172.0.0.3
use_relays: true
static_host_map:
172.0.0.2:
- ....:4242
172.0.0.3:
- ....:4242
tun:
dev: nebula1
disabled: false
drop_local_broadcast: false
drop_multicast: false
routes: null
tx_queue: 5000
unsafe_routes: null
Host B
firewall:
conntrack:
default_timeout: 10m
max_connections: 100000
tcp_timeout: 12m
udp_timeout: 3m
inbound:
- host: any
port: any
proto: icmp
- host: any
port: any
proto: any
outbound:
- host: any
port: any
proto: any
lighthouse:
am_lighthouse: false
hosts:
- 172.0.0.2
- 172.0.0.3
interval: 60
listen:
batch: 128
host: 0.0.0.0
port: 0
read_buffer: 10485760
write_buffer: 10485760
logging:
format: text
level: info
pki:
ca: |
cert: |
key: |
preferred_ranges:
- 10.0.0.0/24
punchy:
delay: 1s
punch: true
punch_back: true
respond: true
relay:
am_relay: false
relays:
- 172.0.0.2
- 172.0.0.3
use_relays: true
static_host_map:
172.0.0.2:
- ....:4242
172.0.0.3:
- ....:4242
tun:
dev: nebula1
disabled: false
drop_local_broadcast: false
drop_multicast: false
routes: null
tx_queue: 5000
unsafe_routes: null
Lighthouse
firewall:
conntrack:
default_timeout: 10m
max_connections: 100000
tcp_timeout: 12m
udp_timeout: 3m
inbound:
- host: any
port: any
proto: icmp
- host: any
port: any
proto: any
outbound:
- host: any
port: any
proto: any
lighthouse:
am_lighthouse: true
interval: 60
listen:
batch: 128
host: 0.0.0.0
port: 4242
logging:
format: text
level: debug
pki:
ca: |
cert: |
key: |
punchy:
delay: 1s
punch: true
punch_back: true
respond: true
relay:
am_relay: true
use_relays: false
stats:
host: ...:2003
interval: 10s
lighthouse_metrics: true
message_metrics: true
prefix: nebula.cloud_proxy
protocol: tcp
type: graphite
tun:
dev: nebula1
disabled: false
drop_local_broadcast: false
drop_multicast: false
routes: null
tx_queue: 5000
unsafe_routes: null
Also worth noting that this issue doesn't occur after a normal reboot.
On the surface this looks like a nebula IP reuse issue but there's so much redacted that it's hard to understand exactly what is happening.
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
- Does the
<host's external IP>
ever change or is it always the same? - Does it actually say
index=0
?
Both Host A and Host B appear to be attempting to initiate a tunnel with the other but neither logs show a matching handshake despite Host A responding. This means we should see logs on Host A responding to handshakes initiated by Host B.
To help can you do the following:
- Ensure that multiple processes aren't running using the same nebula ip (the ip issued in the certificate)
- Provide more, unredacted logs from all hosts. You can send to my email if you are concerned about privacy nate@defined.net
Great, I've sent you an email.
And there aren't multiple processes running. It's controlled by systemd.