Server hangs
chipsenkbeil opened this issue · comments
I'd seen this as part of #175, so this may be related.
Recent changes to support batch processing seem to trigger this more frequently on writing. In particular, using distant.nvim
to write works fine, but once I leave neovim and rejoin, the server appears to hang. So there may be a busy loop somewhere.
Points to investigate
Using tokio-console, I can see some resources are growing and never disappearing, which may be a sign of issues.
Manager without doing anything:
distant/distant-net/src/server.rs:183
(tokio::sync::rwlock
)distant/distant-net/src/server/ref.rs:57
(tokio::time::sleep
) - bunch of these being created, about 10 per second with no connections (seems to be polling_wait)
After connecting a client to the manager:
distant/distant-net/src/client.rs:269
(tokio::time::sleep
) - bunch of these being createddistant/distant-net/src/client.rs:397
(tokio::time::sleep
) - bunch of these being created
Connection task never wakes up again:
distant/distant-net/src/server/connect:??
(tokio::task
) - seems to occasionally come back to running even with the idle and waker warning (too long to see line, possibly thetokio::spawn
forhandler.on_request
on line 500
Screenshots
TASKS
![image](https://private-user-images.githubusercontent.com/2481802/245233474-814be2c1-2c4d-4712-be90-9be666d5340d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzMzQ3NC04MTRiZTJjMS0yYzRkLTQ3MTItYmU5MC05YmU2NjZkNTM0MGQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDMwNTQ0OTA1NmYzZDhiZDhhZjkwNzRmN2ZiZGMyNTg4ZmIzNjhiMzZjM2I3YjA0MGVhNzliOThhMTI1ODcwMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.SDzN8dcfwsQlBkZvCJgmVTKnt9_X8h2jAJNmzuXk5U8)
RESOURCES
![image](https://private-user-images.githubusercontent.com/2481802/245233535-2bb0a1f5-4d8b-4974-9e80-f53fcd24aa9c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzMzUzNS0yYmIwYTFmNS00ZDhiLTQ5NzQtOWU4MC1mNTNmY2QyNGFhOWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGJhNTg2NGMzMWE1NWU3MGQ0ZGMyMzBiZmZlNDk5Y2Y2ZjFiNGRmN2UyMmYwZjA5MmM4MWI3ZWI4MWU4MTdmNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.NaBjuvr4P9kHen_OkWE7-HPANcNxggK6sVjYNiwpKik)
HTOP
![image](https://private-user-images.githubusercontent.com/2481802/245233795-d11ce08c-222a-41e7-8998-9633f49878e3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzMzc5NS1kMTFjZTA4Yy0yMjJhLTQxZTctODk5OC05NjMzZjQ5ODc4ZTMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NjA5YWE5ZjA3MWY5MjU5YmZlNWE4MTRlMjMzZDYwY2RiOWQ1ZDExNDUzNzY5OTQ3ODkzYjkwOTA5Zjg0MjI2NyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.D-vg0Fb2U_boteiQL8nQT_NmfiN4vwXzJyXriUeclDY)
For this one, high CPU could be due to tracing.
Example of sleep task that has lost association:
![image](https://private-user-images.githubusercontent.com/2481802/245234513-ebb7625a-cf35-4709-b20e-2b0687954754.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzNDUxMy1lYmI3NjI1YS1jZjM1LTQ3MDktYjIwZS0yYjA2ODc5NTQ3NTQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NDhlNTA1MmYyZjhmZWU3ZGVhNTJkN2NiMmUxYTE1MzM2MDlmNjNjZWQyYTk1NmY5YThlNGVjNzMzYjlmYTc4ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Ae1N6g5mwVdjgI81C8mEyggSJVLwpbDfk5uREnYDKuE)
Compared to one with an association:
![image](https://private-user-images.githubusercontent.com/2481802/245234568-fd778713-c688-468f-bc13-b100355cf95a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzNDU2OC1mZDc3ODcxMy1jNjg4LTQ2OGYtYmMxMy1iMTAwMzU1Y2Y5NWEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjgyODJiODA1ZjAwYmY2ZDIxZDhkOGU0MzQ1YjE0MjQ5OWI0N2M5NzFhZWVmZTRjNzMwOTI0OTE5MTEwOGVlOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.QrYRqzYiHf8IIvsQp0306xNPlnUU07cXgFeTkGe3w1Y)
Also notice the large ms time
Manager server rwlock
![image](https://private-user-images.githubusercontent.com/2481802/245235176-d960665d-30dc-4105-9be9-b970419da16f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzNTE3Ni1kOTYwNjY1ZC0zMGRjLTQxMDUtOWJlOS1iOTcwNDE5ZGExNmYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NWYzZGEyYTA2YThiZjFmNTBmMmFjZmQ3MDZkMjkwNjI3OTJiOTZiNTliZDQ1YjcyZmJiYjAwZjQ4NDQyYTg3ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.8_uWweNAewymLv9P59F-f3d_EdqAL36CuxKvn4m7VUY)
![image](https://private-user-images.githubusercontent.com/2481802/245235050-27f3ed1a-068e-4f67-8d7b-da5298908054.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzNTA1MC0yN2YzZWQxYS0wNjhlLTRmNjctOGQ3Yi1kYTUyOTg5MDgwNTQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NGY5ZDg5MDVjZjRhZmM5NjhhMzI1NjFhNzA4ODFkMDQwNTlmNDg3OWM0Zjk2MWI3Njc1OTkxYTcxMDY1MzRjNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.f-yR77PdRhKCjHXVeCJckemBjH71rzEUSCLiYlU74ok)
Keychain writing
![image](https://private-user-images.githubusercontent.com/2481802/245235469-45453e7f-9f9d-4721-8215-1fd65a33eb9b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzNTQ2OS00NTQ1M2U3Zi05ZjlkLTQ3MjEtODIxNS0xZmQ2NWEzM2ViOWIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODQwNTlhMDQzZmI5OWVmZmE4YTcxMTE2YjNjYWU5Y2Q0NDA5NjU1MWQxMmRjOTZhNzdmYTk1YzNkYzYzNTFkMSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.9fXdWVkiwZiogjNribsCPiRQGgZ-HHmfPDvaqfVpDEQ)
Two separate task sources for RwLock::write
. Could that be a problem?
Once hangup happens
![image](https://private-user-images.githubusercontent.com/2481802/245236115-f1bbda6d-410a-4b33-84b2-50f0f60746c8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTcxMDc0MTUsIm5iZiI6MTcxNzEwNzExNSwicGF0aCI6Ii8yNDgxODAyLzI0NTIzNjExNS1mMWJiZGE2ZC00MTBhLTRiMzMtODRiMi01MGYwZjYwNzQ2YzgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDUzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA1MzBUMjIxMTU1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OGJjOTk5ZjkzMWEyNmI1MWU2ODYyZDM2YjcyNTQ0YTNhM2YzYzlkODkwOGRiNzE3YzA4YTljZmFmZmNkNjlkZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.NwFiaKP-0-scFRuUBiCX9TK4ZRZGQsqn5UvHZSRkWuU)
Flood of client and server connection sleep events. Wondering if there's an issue with reconnecting or something.
The only sleep in the connection file is waiting for a read or write availability where read and write are blocked. This is done in a loop, so we could add logging here to report when being fully blocked starts and ends.
distant/distant-net/src/client:269
- client is checking ready status (sleep used as timeout condition)distant/distant-net/src/client:397
- client is waiting for read or writedistant/distant-net/src/server/connection:585
- server is waiting for read or write
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.