slackapi / node-slack-sdk

Slack Developer Kit for Node.js

Home Page:https://slack.dev/node-slack-sdk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

@slack/socket-mode - Handle event.type === disconnect && reason === 'too_many_websockets'

jhankim opened this issue · comments

commented

👋 from Slack Customer Success.

Current state of @slack/socket-mode package does not account for event.type === disconnect && reason === 'too_many_websockets', causing unhandled error to bubble up when Slack decides to disconnect existing connection during connecting:handshaking state

Here's a log output - the reconnect happens initially after the app receives event.type === 'disconnect' && reason === 'warning'

Packages:

Select all that apply:

  • @slack/web-api
  • @slack/rtm-api
  • @slack/webhooks
  • @slack/oauth
  • @slack/socket-mode
  • @slack/types
  • I don't know

Requirements

Feature request: have the app catch reason === 'too_many_websockets'

Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.

Hello! Thanks for reporting.

What would the expectation be in this scenario? What should @slack/socket-mode do when it encounters this error?

commented

Forgot the log... 😅

2023-08-29T02:17:33.583983751Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "disconnect",
    "reason": "warning",
    "debug_info": {
        "host": "applink-13"
    }
}
2023-08-29T02:17:33.585261952Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received "disconnect" (warning) message - creating the second connection
2023-08-29T02:17:33.585795905Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Cancelled the job waiting for ping from Slack
2023-08-29T02:17:33.586029749Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Terminated the heart beat job
2023-08-29T02:17:33.586265381Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: reconnecting
2023-08-29T02:17:33.586691298Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Reconnecting to Slack ...
2023-08-29T02:17:33.587639859Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting
2023-08-29T02:17:33.587987683Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Going to establish a new connection to Slack ...
2023-08-29T02:17:33.588221708Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticating
2023-08-29T02:17:33.588430306Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Going to retrieve a new WSS URL ...
2023-08-29T02:17:33.588655237Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') start
2023-08-29T02:17:33.590704154Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request url: https://slack.com/api/apps.connections.open
2023-08-29T02:17:33.59093626Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request body: 
{}
2023-08-29T02:17:33.591177851Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request headers: 
{}
2023-08-29T02:17:33.642127756Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http response received
2023-08-29T02:17:33.643022349Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request result: 
{
    "ok": true,
    "url": "wss://wss-primary.slack.com/link/?ticket=xxxx&app_id=xxxx",
    "response_metadata": {
        "scopes": [
            "connections:write"
        ],
        "acceptedScopes": [
            "connections:write"
        ]
    }
}
2023-08-29T02:17:33.643565667Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') end
2023-08-29T02:17:33.643806343Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticated
2023-08-29T02:17:33.666075231Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:handshaking
2023-08-29T02:17:33.672679037Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "disconnect",
    "reason": "too_many_websockets",
    "debug_info": {
        "host": "applink-6"
    }
}
2023-08-29T02:17:33.673997382Z stdout F �[31merror�[39m:[app]:2023-08-29 02:17:33: An unhandled error occurred while Bolt processed (type: undefined, error: TypeError: Cannot read properties of undefined (reading 'event'))
2023-08-29T02:17:33.674576814Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Error details: TypeError: Cannot read properties of undefined (reading 'event'), retry num: undefined, retry reason: undefined
2023-08-29T02:17:33.676869527Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: reconnecting
2023-08-29T02:17:33.676971669Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Reconnecting to Slack ...
2023-08-29T02:17:33.67719043Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting
2023-08-29T02:17:33.67730345Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Going to establish a new connection to Slack ...
2023-08-29T02:17:33.677425795Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticating
2023-08-29T02:17:33.67753144Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Going to retrieve a new WSS URL ...
2023-08-29T02:17:33.677637366Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') start
2023-08-29T02:17:33.678225039Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request url: https://slack.com/api/apps.connections.open
2023-08-29T02:17:33.678302499Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request body: 
{}
2023-08-29T02:17:33.678434963Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request headers: 
{}
2023-08-29T02:17:33.706399893Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http response received
2023-08-29T02:17:33.706740691Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request result: 
{
    "ok": true,
    "url": "wss://wss-primary.slack.com/link/?ticket=xxxx&app_id=xxxx",
    "response_metadata": {
        "scopes": [
            "connections:write"
        ],
        "acceptedScopes": [
            "connections:write"
        ]
    }
}
2023-08-29T02:17:33.706905114Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') end
2023-08-29T02:17:33.707562051Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticated
2023-08-29T02:17:33.726924532Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:handshaking
2023-08-29T02:17:33.735835811Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "disconnect",
    "reason": "too_many_websockets",
    "debug_info": {
        "host": "applink-3"
    }
}
2023-08-29T02:17:33.736517499Z stdout F �[31merror�[39m:[app]:2023-08-29 02:17:33: An unhandled error occurred while Bolt processed (type: undefined, error: TypeError: Cannot read properties of undefined (reading 'event'))
2023-08-29T02:17:33.73664402Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Error details: TypeError: Cannot read properties of undefined (reading 'event'), retry num: undefined, retry reason: undefined
2023-08-29T02:17:33.737626433Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: reconnecting
2023-08-29T02:17:33.737761541Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Reconnecting to Slack ...
2023-08-29T02:17:33.738044197Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting
2023-08-29T02:17:33.738061619Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Going to establish a new connection to Slack ...
2023-08-29T02:17:33.738211885Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticating
2023-08-29T02:17:33.738315455Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Going to retrieve a new WSS URL ...
2023-08-29T02:17:33.738410664Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') start
2023-08-29T02:17:33.73889481Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request url: https://slack.com/api/apps.connections.open
2023-08-29T02:17:33.738986954Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request body: 
{}
2023-08-29T02:17:33.739225569Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request headers: 
{}
2023-08-29T02:17:33.765901421Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http response received
2023-08-29T02:17:33.766275841Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request result: 
{
    "ok": true,
    "url": "wss://wss-primary.slack.com/link/?ticket=xxxx&app_id=xxxx",
    "response_metadata": {
        "scopes": [
            "connections:write"
        ],
        "acceptedScopes": [
            "connections:write"
        ]
    }
}
2023-08-29T02:17:33.766585039Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') end
2023-08-29T02:17:33.766896018Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticated
2023-08-29T02:17:33.788489913Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:handshaking
2023-08-29T02:17:33.797143119Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "hello",
    "num_connections": 10,
    "debug_info": {
        "host": "applink-8",
        "build_number": 38,
        "approximate_connection_time": 18060
    },
    "connection_info": {
        "app_id": "XXXX"
    }
}
2023-08-29T02:17:33.79851267Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connected
2023-08-29T02:17:33.798891461Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Now connected to Slack
2023-08-29T02:17:33.799282016Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connected:preparing
2023-08-29T02:17:33.799645249Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Switching to the secondary connection ...
2023-08-29T02:17:33.799881795Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Switched to the secondary connection
2023-08-29T02:17:33.800372396Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Terminated the old connection
2023-08-29T02:17:33.800880064Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Started running a new heart beat job
2023-08-29T02:17:33.801246493Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connected:ready
commented

The expected behavior would be for the app to catch the error when it receives:

{
  type: 'disconnect',
  reason: 'too_many_websockets'
  ...
}

OK, so catch the error and I guess not attempt to re-connect?

commented

@filmaj I think the reconnect behavior is still necessary given the app would enter into a disconnected state forever. It's worth noting that in my app logs, I only see reason: 'too_many_websockets' only as a result of the app trying to reconnect after it receives reason: 'warning'.

I believe we only allow 10 concurrent websocket connections per app, so if Slack is returning "too many websockets", it makes sense to me to reduce the amount of reconnect attempts - otherwise the warning/error will continue to be raised, and we would essentially be stampeding on the apps.connection.open API.

That's what I'm trying to tease out asking these questions and having this discussion: what should the proper behaviour be? 🤔

commented

Ah I see - I wasn't thinking too clearly on the implications of max limit imposed by the platform. I agree with your suggestion, the app should catch the error, then enter disconnected state. At this point the app owner can decide further logic (reboot the app via healthcheck, etc)

👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out. If you think this issue needs to be prioritized, please comment to get the thread going again! Maintainers also review issues marked as stale on a regular basis and comment or adjust status if the issue needs to be reprioritized.

As this issue has been inactive for more than one month, we will be closing it. Thank you to all the participants! If you would like to raise a related issue, please create a new issue which includes your specific details and references this issue number.

any updates? have the same problem

Apologies for no updates here and that I disappeared... let me look into this again after all this time. 🙇

Thank you)
Problem is that we have a bot and once per week it goes down due this issue. We replace token and repeat...

@dbudim hmm, by changing your token you avoid the issue? That doesn't make sense to me.. which token are you replacing? Your bot token or your application token? Do you have more than one instance of the app running?

@dbudim hmm, by changing your token you avoid the issue? That doesn't make sense to me.. which token are you replacing? Your bot token or your application token? Do you have more than one instance of the app running?

Single instance of app, replacing application token and problem is gone for a week😅

@dbudim do the logs look similar to what the original poster posted? As in, first you see a "type": "disconnect","reason": "warning", followed by "type": "disconnect","reason": "too_many_websockets", ?

WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [INFO ] [SocketModeClient:261] - New session is open
[WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [DEBUG] [SocketModeClient:384] - Socket Mode Request:

{
  "type": "disconnect",
  "reason": "too_many_websockets",
  "debug_info": {
    "host": "applink-6"
  }
}

[WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [INFO ] [SocketModeClient:272] - onClose listener is called (code: 1006, reason: )
[WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [DEBUG] [DetailedLoggingListener:84] -

Yes, same situation

java SDK:

<dependency>
       <groupId>com.slack.api</groupId>
      <artifactId>bolt</artifactId>
      <version>1.34.0</version>
</dependency>
<dependency>
       <groupId>com.slack.api</groupId>
       <artifactId>bolt-socket-mode</artifactId>
       <version>1.34.0</version>
</dependency>

Oh, you are using the Java SDK, good to know.

I've posted your issue to the java-slack-sdk repo: slackapi/java-slack-sdk#1223

Best to follow that issue along since the code for the node socket-mode library is different from the java bits.

@seratch since I am looking at doing a major new release of the socket mode package, do you have any thoughts on this issue? If Slack raises "too many websockets" - what should this program do? I feel like continuing to attempt to reconnect would be creating unnecessary, likely-to-fail connection attempts. Do you have opinions on what the package should do?

I started to address this in a draft PR: #1762

@seratch do you have opinions on how our socket mode packages should handle "too many websockets" errors, at a high level? The node one does not handle it at all and I believe it can cause problems.

@filmaj thank you for your detailed investigation of the scenario. I agree handling the 'too_many_websockets' reason pattern in a proper way could be a nice improvement. In reality, the only action SocketModeClient can take is to patiently wait until the Slack server permits a new connection. That being said, modifying the behavior in this manner should certainly be better than currently treating the pattern as a "slack_event" for sure.

The fix has now been released to mainline in 1.3.5 and is out of release candidacy. It will also be present in the soon-to-come 2.0 release of @slack/socket-mode.

@filmaj I seem to be hitting the following now on 1.3.5:

[INFO]  socket-mode:SocketModeClient:0 Going to establish a new connection to Slack ...
/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76
      throw new Error('Unhandled event \'' + event + '\' in state \'' + this.currentState + '\'.');
            ^

Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
    at StateMachine.handleUnhandledEvent (/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76:13)
    at /home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
    at TaskScheduler.execute (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:29:7)
    at TaskScheduler.enqueue (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:19:12)
    at HierarchicalStateMachine.handle (/home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
    at SocketModeClient.onWebSocketMessage (/home/patrick/slack/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
    at WebSocket.onMessage (/home/patrick/slack/node_modules/ws/lib/event-target.js:132:16)
    at WebSocket.emit (node:events:518:28)
    at Receiver.receiverOnMessage (/home/patrick/slack/node_modules/ws/lib/websocket.js:1068:20)
    at Receiver.emit (node:events:518:28)

With debug logging it shows:

[DEBUG]  socket-mode:SocketModeClient:0 Received a message on the WebSocket: {"type":"disconnect","reason":"too_many_websockets","debug_info":{"host":"applink-2"}}
[DEBUG]  socket-mode:SocketModeClient:0 Received "disconnect" (reason: too_many_websockets) message - will attempt reconnect

@MadrMan I filed #1787 to track this.

Worth noting that, while this is a legit problem and we will address it in this package, you should also consider why Slack is telling you you have too many parallel websocket connections open with Slack (10 is the maximum allowed). If you can get the number of parallel websocket connections for this app down under 10, you would mitigate seeing this issue.