websockets-rs / rust-websocket

A WebSocket (RFC6455) library written in Rust

Home Page:http://websockets-rs.github.io/rust-websocket/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance issues when running on Linux

iBelieve opened this issue · comments

I've been using rust-websocket for a while now and have been happy with using it. However, I noticed it doesn't seem to perform as well as I would expect. I put together some tests with a simple echo server and a client that sends a message and times how long it takes to get it echoed back. Testing with a server and client running locally, rust-websocket seems to perform fine on macOS but on Linux I'm seeing an extra 80ms or so of extra round-trip delay just using a connection over localhost. My test programs and test results are here: https://github.com/lelander/websocket-tests

Testing in my production environment using armv7 devices on both ends, rust-websocket seems to have a round-trip time of around 340ms, while a client/server echo test using ws-rs only takes 90ms.

Server Client Time
ws ws 90ms
websocket ws 211ms
ws websocket 220ms
websocket websocket 341ms

Do I not have rust-websocket configured correctly, or am I doing something obviously wrong? I'm not sure why I'm seeing such significant performance differences with this library compared with ws-rs and the C++ and Python websocket libraries I've tried.

The main problem is Nagle algorithm:

$ target/release/server
...

$ target/release/client
Connected to websocket server
Round-trip time: 100446us
$ LD_PRELOAD=/mnt/src/git/libnodelay/libnodelay.so target/release/server
...

$ LD_PRELOAD=/mnt/src/git/libnodelay/libnodelay.so target/release/client
Connected to websocket server
Round-trip time: 151us

Secondary problem that is lack of proper buffering: it uses multiple syscalls to send one message piece by piece:

sendto(3, "\201", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(3, "\215", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(3, "(1\263\334", 4, MSG_NOSIGNAL, NULL, 0) = 4
sendto(3, "\31\4\200\344\30\0\202\350\35\6\212\355\35", 13, MSG_NOSIGNAL, NULL, 0) = 13

Note: using websocat, which is based on rust-websocket as a server is also fast:

$ websocat -Et ws-l:127.0.0.1:3000 mirror:
...

$ LD_PRELOAD=/mnt/src/git/libnodelay/libnodelay.so target/release/client
Connected to websocket server
Round-trip time: 140us

Maybe add async version of rust-websocket to the list?

Setting TCP_NODELAY using libnodelay or client.set_nodelay(true) did the trick and brought round-trip times to around the same numbers that I was seeing from the other libraries I've tested. Thanks for the assistance!