chipsenkbeil / distant

🚧 (Alpha stage software) Library and tooling that supports remote filesystem and process operations. 🚧

Home Page:https://distant.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Server hangs

chipsenkbeil opened this issue · comments

I'd seen this as part of #175, so this may be related.

Recent changes to support batch processing seem to trigger this more frequently on writing. In particular, using distant.nvim to write works fine, but once I leave neovim and rejoin, the server appears to hang. So there may be a busy loop somewhere.

Points to investigate

Using tokio-console, I can see some resources are growing and never disappearing, which may be a sign of issues.

Manager without doing anything:

  • distant/distant-net/src/server.rs:183 (tokio::sync::rwlock)
  • distant/distant-net/src/server/ref.rs:57 (tokio::time::sleep) - bunch of these being created, about 10 per second with no connections (seems to be polling_wait)

After connecting a client to the manager:

  • distant/distant-net/src/client.rs:269 (tokio::time::sleep) - bunch of these being created
  • distant/distant-net/src/client.rs:397 (tokio::time::sleep) - bunch of these being created

Connection task never wakes up again:

  • distant/distant-net/src/server/connect:?? (tokio::task) - seems to occasionally come back to running even with the idle and waker warning (too long to see line, possibly the tokio::spawn for handler.on_request on line 500

Screenshots

TASKS

image

RESOURCES

image

HTOP

image

For this one, high CPU could be due to tracing.

Example of sleep task that has lost association:

image

Compared to one with an association:

image

Also notice the large ms time

Manager server rwlock

image image

Keychain writing

image

Two separate task sources for RwLock::write. Could that be a problem?

Once hangup happens

image

Flood of client and server connection sleep events. Wondering if there's an issue with reconnecting or something.

The only sleep in the connection file is waiting for a read or write availability where read and write are blocked. This is done in a loop, so we could add logging here to report when being fully blocked starts and ends.

  • distant/distant-net/src/client:269 - client is checking ready status (sleep used as timeout condition)
  • distant/distant-net/src/client:397 - client is waiting for read or write
  • distant/distant-net/src/server/connection:585 - server is waiting for read or write

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.