c10k problem in jdk 21 with virtual threads.
jasonjckn opened this issue · comments
I'd thought I'd post here because there was a previous thread here on loom, feel free to close if off topic.
this is probably not an issue with ring-jetty9-adapter, the issue is likely further down the stack.
Doing a simple benchmark, my app can't handle 10k concurrent requests with virtual threads, i get very poor throughput, and mostly notably socket errors. I've attached a minimum viable code to reproduce it.
I'm using jetty v11, i'd be very curious to see how jetty v12 handles it.
CC @jimpil
Context:
running JDK 21 (ea) , tried both zulu and oracle on arm64 macos, (also tested on linux, which has poor throughput, but no socket errors)
deps.edn
info.sunng/ring-jetty9-adapter {:mvn/version "0.22.1"}
org.eclipse.jetty/jetty-server 11.0.15
Please see attachment for source code.
The tests are ran using 'hey' and 'wrk'
\w 1k concurrent
⚡ wrk --latency --timeout 1m -d 2s -c 1000 -t 1000 'http://localhost:3000/api/1.0/admin/uptest'
Running 2s test @ http://localhost:3000/api/1.0/admin/uptest
1000 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.00s 3.29ms 1.02s 64.88%
Req/Sec 0.04 0.20 1.00 95.70%
Latency Distribution
50% 1.00s
75% 1.01s
90% 1.01s
99% 1.01s
1999 requests in 2.10s, 322.10KB read
Requests/sec: 949.66
Transfer/sec: 153.02KB
\w 10k concurrent
⚡ wrk --latency --timeout 1m -d 2s -c 10000 -t 1000 'http://localhost:3000/api/1.0/admin/uptest'
Running 2s test @ http://localhost:3000/api/1.0/admin/uptest
1000 threads and 10000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.41s 1.80ms 1.41s 65.54%
Req/Sec 7.59 6.04 30.00 88.24%
Latency Distribution
50% 1.41s
75% 1.41s
90% 1.41s
99% 1.41s
148 requests in 2.11s, 23.85KB read
Socket errors: connect 0, read 4982, write 0, timeout 0
Requests/sec: 70.00
Transfer/sec: 11.28KB
Comparing 1k vs 10k: 10k has more socket errors, less throughput, and less total requests replied to.
if I use 'hey' , similar kind of stats, and it prints out a ton of
[1] Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62673->[::1]:3000: read: connection reset by peer
[1] Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62675->[::1]:3000: read: connection reset by peer
[1] Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62678->[::1]:3000: read: connection reset by peer
[1] Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62680->[::1]:3000: read: connection reset by peer
It could relate to this. Would it be easy for you to start the JVM with -Djdk.tracePinnedThreads=full
and look for the relevant console output? Moreoveor, could you try with regular OS threads, and see if things improve (or not)...
`
Hmm... it could also have to do with your OS - see this.
@jimpil Thanks for the quick reply,
If none of your ideas solve it, I'm thinking # of selectors and # acceptors might be the issue too https://eclipse.dev/jetty/javadoc/jetty-12/org/eclipse/jetty/server/ServerConnector.html, since they default to 1.
Hoping to get some more time this week to test these ideas out, thanks!
@jimpil update... on my experiments
I tried -Djdk.tracePinnedThreads=full
Zero output from this... so I guess that's good.
I tried sudo sysctl -w net.inet.ip.portrange.first=32768
No difference
I tried various numbers of selector & acceptor threads
No difference
I also tried forcing mandatory C2 compilation for all bytecode, (because I was seeing a lot of time spent compiling during profiling).
No difference
As for https://clojure.atlassian.net/jira/software/c/projects/CLJ/issues/CLJ-2771
This is a possible culprit, since the ring-jetty9-adapter calls enumeration-seq, et al, which is synchronized. Wouldn't be my first guess though.
The last think I want to try is Jetty v12, and removing synchronized, but otherwise I'm a bit stumped.
If -Djdk.tracePinnedThreads=full
didn't print out anything suspicious, then you can ignore CLJ-2771 (for your tests at least). I don't want to send you down the wrong path, but to me this sounds like an OS issue (hitting some sort of file descriptor or port limit).