API errors shouldn't be treated as probe errors
alexey-yarmosh opened this issue · comments
Currently while collecting probe metadata if our API failed (db problems, code problems, etc.) it will send the failed to collect probe metadata
message to the probe, which forces it to reconnect only in 1 hour. Instead we should reconnect in 1 hour only when we exactly know that the problem is with the probe itself. For cases when API is the root of error - probe should reconnect immediately.
Example of an API error, which caused the probe to disconnect for 1 hour:
Error: failed to collect probe metadata
at file:///app/dist/lib/ws/middleware/probe-metadata.js:20:15
[2023-08-10 14:16:52] [ERROR] [76] [probe-metadata] timeout reached while waiting for fetchSockets response
Error: timeout reached while waiting for fetchSockets response
at Timeout._onTimeout (/app/node_modules/@socket.io/redis-adapter/dist/index.js:568:28)
at listOnTimeout (node:internal/timers:569:17)
at process.processTimers (node:internal/timers:512:7)
[2023-08-10 14:16:52] [INFO] [76] [ws:error] disconnecting client 3kGpckjm0IzAEZ8fAAAT for (failed to collect probe metadata) [x.x.x.x]
[2023-08-10 14:16:52] [DEBUG] [76] [ws:error] failed to collect probe metadata
{ data: { ipAddress: 'x.x.x.x' } }