coder / vscode-coder

Open any Coder workspace in VS Code with a single click.

Repository from Github https://github.comcoder/vscode-coderRepository from Github https://github.comcoder/vscode-coder

Add automatic WebSocket reconnection with exponential backoff

EhabY opened this issue · comments

WebSocket streams occasionally drop due to client network loss or server-side interruptions. Today, many parts of the app rely on one-way WebSockets for push updates, and a closed socket isn’t re-established unless the user refreshes the window. This leads to stale UIs (e.g., the agent metadata status bar showing an unknown error) until a manual reload.

2025-09-25 17:27:02.604 [warning] Failed to query metadata: WebSocket closed unexpectedly: undefined undefined

Proposal:
Introduce an automatic reconnection policy for all one-way WebSocket clients:

  • Reconnect with exponential backoff + jitter, capped (e.g., 250ms → 1s → 2s … up to 30s).
  • Stop retrying on non-recoverable close codes (auth errors, 4xx) and surface a clear message.
  • Resume prior subscriptions/state on reconnect; reset backoff after stable activity.
  • Log reconnection attempts.

Expected outcome:
Sockets recover transparently from transient failures; users don’t need to refresh to restore live updates.

I can confirm this annoying warning message. It pops up under Linux, Windows and MacOS.

Image

In Coder logs I saw this error too:

2025-10-08 10:10:37.291 [error] ✗ WS 8f11XXXX error /api/v2/workspaceagents/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/watch-metadata-ws 440ms - Unexpected server response: 200 undefined
2025-10-08 10:10:37.292 [warning] Failed to query metadata: Unexpected server response: 200
2025-10-08 10:10:37.292 [warning] Failed to query metadata: WebSocket closed unexpectedly: undefined undefined

A manuel downgrade to release 1.10.1 helps. Every version above shows the message in the bottom bar. This may confuse our devs as it has obvious no impact on functionality.

1.11.0 replaces the Server-sent events (SSEs) with WebSocket (WS) connections.

  1. Is there a reason why the WS connection fails in your end (perhaps a firewall), like is it consistent or just random errors?
  2. What version of the Coder Server is used?
  3. Are all the other WS connections working fine? (there's a few besides this one, you can check the logs to see)
  4. Also the Coder Workspaces tree view has the agent status as well, does it work from there or show an error as well?

I'm trying to determine if the WS itself is consistently failing or just an occasional failure that shouldn't be so prominent (along with retrying as the ticket suggests).

Thanks for the bug report!

We are behind an Azure App Proxy, WAF and a firewall. This means we are using --header-command to inject the needed token. This warning is consistent and does not go away when the VSCode connection is reloaded and pops up immediately .

Currently we are a bit behind regarding the server version. We are using v2.22.1.

There were also some undefined error logs:

2025-10-08 10:02:04.127 [error] Unexpected server response: 200

Let me know if I can help you further

v2.22.0 supports WebSockets for those types of requests so the server version seems to be okay.

A few things to help me narrow this down. If you open the Coder Remote View Container (see image):

Image
  • Do you see the agent metadata or is it an error? Does the metadata show in 1.10.1?
  • Open the logs and set the log level to "trace", do you see any failures (or succeeds) for /api/v2/workspaces/<workspace.id>/watch-ws or /api/v2/workspaceagents/<agentId>/watch-metadata-ws or /api/v2/notifications/inbox/watch or /api/v2/workspacebuilds/<buildId>/logs (besides the one you provided already).

I see in the UI the same error:

Image

When I downgrade to 1.10.1 then the Agent metadata gets successful shown.

Maybe I can try on friday to set the loglevel to tracing

@EhabY, do we not use the coder.headerCommand when making this WS connection? If the correct header command is set in the Coder extension, it should work even behind an App proxy.

@matifali I think you are correct, the headers were not propagated correctly for WS connections (none of them). Nice catch!

This was also my first thought, too. This was the reason I told you our setup.

Maybe it is worth to write down a rule for gen AI to check if every http connection to Coder is using header command.

Thanks for the help! I had already unified all HTTP connections in a single place but WebSockets never used the header command so they were always broken. We just noticed now because it's very visible with the status item.

This is a very good use-case for GenAI to double check 👀

Sorry it took a while to release this fix but it should be work now if you update to v1.11.3 @MrPeacockNLB 🙏