sunng87 / ring-jetty9-adapter

An enhanced version of jetty adapter for ring, with additional features like websockets, http/2 and http/3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

c10k problem in jdk 21 with virtual threads.

jasonjckn opened this issue · comments

I'd thought I'd post here because there was a previous thread here on loom, feel free to close if off topic.
this is probably not an issue with ring-jetty9-adapter, the issue is likely further down the stack.

Doing a simple benchmark, my app can't handle 10k concurrent requests with virtual threads, i get very poor throughput, and mostly notably socket errors. I've attached a minimum viable code to reproduce it.

c10k-problem.zip

I'm using jetty v11, i'd be very curious to see how jetty v12 handles it.
CC @jimpil


Context:

running JDK 21 (ea) , tried both zulu and oracle on arm64 macos, (also tested on linux, which has poor throughput, but no socket errors)
deps.edn

        info.sunng/ring-jetty9-adapter                         {:mvn/version "0.22.1"}
         org.eclipse.jetty/jetty-server 11.0.15

Please see attachment for source code.

The tests are ran using 'hey' and 'wrk'

\w 1k concurrent

 ⚡ wrk --latency --timeout 1m -d 2s -c 1000 -t 1000 'http://localhost:3000/api/1.0/admin/uptest'
Running 2s test @ http://localhost:3000/api/1.0/admin/uptest
  1000 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.00s     3.29ms   1.02s    64.88%
    Req/Sec     0.04      0.20     1.00     95.70%
  Latency Distribution
     50%    1.00s
     75%    1.01s
     90%    1.01s
     99%    1.01s
  1999 requests in 2.10s, 322.10KB read
Requests/sec:    949.66
Transfer/sec:    153.02KB

\w 10k concurrent

 ⚡ wrk --latency --timeout 1m -d 2s -c 10000 -t 1000 'http://localhost:3000/api/1.0/admin/uptest'
Running 2s test @ http://localhost:3000/api/1.0/admin/uptest
  1000 threads and 10000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.41s     1.80ms   1.41s    65.54%
    Req/Sec     7.59      6.04    30.00     88.24%
  Latency Distribution
     50%    1.41s
     75%    1.41s
     90%    1.41s
     99%    1.41s
  148 requests in 2.11s, 23.85KB read
  Socket errors: connect 0, read 4982, write 0, timeout 0
Requests/sec:     70.00
Transfer/sec:     11.28KB

Comparing 1k vs 10k: 10k has more socket errors, less throughput, and less total requests replied to.

if I use 'hey' , similar kind of stats, and it prints out a ton of

   [1]   Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62673->[::1]:3000: read: connection reset by peer
  [1]   Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62675->[::1]:3000: read: connection reset by peer
  [1]   Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62678->[::1]:3000: read: connection reset by peer
  [1]   Get "http://localhost:3000/api/1.0/admin/uptest": read tcp [::1]:62680->[::1]:3000: read: connection reset by peer
  

It could relate to this. Would it be easy for you to start the JVM with -Djdk.tracePinnedThreads=full and look for the relevant console output? Moreoveor, could you try with regular OS threads, and see if things improve (or not)...
`

Hmm... it could also have to do with your OS - see this.

@jimpil Thanks for the quick reply,

If none of your ideas solve it, I'm thinking # of selectors and # acceptors might be the issue too https://eclipse.dev/jetty/javadoc/jetty-12/org/eclipse/jetty/server/ServerConnector.html, since they default to 1.

Hoping to get some more time this week to test these ideas out, thanks!

@jimpil update... on my experiments

I tried -Djdk.tracePinnedThreads=full
Zero output from this... so I guess that's good.

I tried sudo sysctl -w net.inet.ip.portrange.first=32768
No difference

I tried various numbers of selector & acceptor threads
No difference

I also tried forcing mandatory C2 compilation for all bytecode, (because I was seeing a lot of time spent compiling during profiling).
No difference

As for https://clojure.atlassian.net/jira/software/c/projects/CLJ/issues/CLJ-2771
This is a possible culprit, since the ring-jetty9-adapter calls enumeration-seq, et al, which is synchronized. Wouldn't be my first guess though.

The last think I want to try is Jetty v12, and removing synchronized, but otherwise I'm a bit stumped.

If -Djdk.tracePinnedThreads=full didn't print out anything suspicious, then you can ignore CLJ-2771 (for your tests at least). I don't want to send you down the wrong path, but to me this sounds like an OS issue (hitting some sort of file descriptor or port limit).