Unhandled NatsError: DISCONNECTED
ArmorDarks opened this issue · comments
- Client version: 2.8.0
- Node version: 16.14.2
Sometimes we receive the following unhandled rejection on our servers when running Nats.js:
NatsError: DISCONNECT
at Function.errorForCode (/code/node_modules/nats/lib/nats-base-client/error.js:100:16)
at /code/node_modules/nats/lib/nats-base-client/protocol.js:116:40
at Array.forEach (<anonymous>)
at ProtocolHandler.resetOutbound (/code/node_modules/nats/lib/nats-base-client/protocol.js:115:15)
at ProtocolHandler.prepare (/code/node_modules/nats/lib/nats-base-client/protocol.js:133:14)
at ProtocolHandler.<anonymous> (/code/node_modules/nats/lib/nats-base-client/protocol.js:179:31)
at Generator.next (<anonymous>)
at /code/node_modules/nats/lib/nats-base-client/protocol.js:8:71
at new Promise (<anonymous>)
at __awaiter (/code/node_modules/nats/lib/nats-base-client/protocol.js:4:12)
The error originates from here https://github.com/nats-io/nats.deno/blob/177c3da18319cbd0ec6066228e08f6709feb0511/nats-base-client/protocol.ts#L197
There are two issues with that:
- It happens inside NATS, and there's no way to catch a failing promise, so on Node 16+ it crashes whole server
- It's unclear why it happens in the first place.
NATS config:
const connection = await connect({
maxReconnectAttempts: -1,
name: 'some-name',
pass: 'some-pass',
servers: ['...servers'],
user: 'some-user',
inboxPrefix: 'some.inbox',
})
What we tried:
- All NATS async methods are wrapped in try catches, so I believe it's thrown somewhere in a callback and can't be caught
Some observations:
- There are no other logs before or after that message.
- It seems to be happening mostly when there's a NATS reconnect happens on the server.
- I wasn't able to reproduce it locally despite doing many bad things to the NATS server and connection
- Last time when we restarted one of the NATS nodes, it caused reconnect on hundreds of our servers. Most parts of them didn't have any issues, but about 30% received that error, so it seems to be some condition that triggers it
So in this case I realized that the trace there is somewhat misleading - because all it is doing is tell you that the request that was pending (in this case a pong) was rejected with the error. The contents of the trace at that point is useless to you. I added code to remove the stack from that error because that can be confusing.
Fixed by #524 nats-io/nats.deno#390 and #526
@aricart 👍 thank you