API errors shouldn't be treated as probe errors

Question

API errors shouldn't be treated as probe errors

alexey-yarmosh opened this issue 10 months ago · comments

Currently while collecting probe metadata if our API failed (db problems, code problems, etc.) it will send the failed to collect probe metadata message to the probe, which forces it to reconnect only in 1 hour. Instead we should reconnect in 1 hour only when we exactly know that the problem is with the probe itself. For cases when API is the root of error - probe should reconnect immediately.

Example of an API error, which caused the probe to disconnect for 1 hour:

Error: failed to collect probe metadata
    at file:///app/dist/lib/ws/middleware/probe-metadata.js:20:15
[2023-08-10 14:16:52] [ERROR] [76] [probe-metadata] timeout reached while waiting for fetchSockets response
Error: timeout reached while waiting for fetchSockets response
    at Timeout._onTimeout (/app/node_modules/@socket.io/redis-adapter/dist/index.js:568:28)
    at listOnTimeout (node:internal/timers:569:17)
    at process.processTimers (node:internal/timers:512:7)
[2023-08-10 14:16:52] [INFO] [76] [ws:error] disconnecting client 3kGpckjm0IzAEZ8fAAAT for (failed to collect probe metadata) [x.x.x.x]
[2023-08-10 14:16:52] [DEBUG] [76] [ws:error] failed to collect probe metadata
{ data: { ipAddress: 'x.x.x.x' } }