Please provide real benchmark data and sever information.

Question

Please provide real benchmark data and sever information.

abacaj opened this issue 8 years ago · comments

Anton Bacaj commented 8 years ago

The claim for 1m concurrent connections is a pretty big one. Please provide the following:

What machine was used to handle 1m connections? E.g. m3.2xLarge (8 cpus, 30gb memory)
- To put it into perspective, node.js can handle 800k connections on a m3.2xLarge.
Are these just ping/pong connections? If so then the actual throughput/rps is MUCH lower then 1million.
- G-WAN + Go can handle Average RPS:784,113 (at least according to their homepage)
What was the average latency for handling 1m concurrent connections?
Was there any bottlenecks? E.g. Is it just hardware that is holding this library back from achieving anymore throughput?

Thank you.

Anton Bacaj · Answer 1 · Tue Nov 24 2015 14:02:24 GMT+0800 (China Standard Time)

My initial tests show there is alot of failures, with only 100 concurrents and 5 req per sec - throughput drops by 8% (unacceptable) and siege fails.

Seems like averaging 1800 req/sec which is only 4x better than net/http not 10x :)

Any idea? Perhaps provide some sample code for me to test with.

In my sample code I am using err := server.ListenAndServe(":8000")

Aliaksandr Valialkin · Answer 2 · Sun Nov 29 2015 03:39:07 GMT+0800 (China Standard Time)

What machine was used to handle 1m connections?

1M concurrent connections with 100k rps were achieved in production, not in test environment. The server had the following configuration:

8xCPU Intel(R) Xeon(R) CPU E5-1630 v3
64GB RAM
1Gbit network

Are these just ping/pong connections?

Long-living keep-alive connections are established by video clients all over the world. Clients periodically send event requests to the server over these connections. The server pushes event data to db and sends back just transparent pixel. Every client sends an event every 10 seconds on average.

What was the average latency for handling 1m concurrent connections?

Less than 100ms from the client side.

Was there any bottlenecks? E.g. Is it just hardware that is holding this library back from achieving anymore throughput?

The main bottleneck was 1Gbit network, so we moved to 10Gbit :)
Also the db (postgres) could handle only 100K inserts per second over a single db connection. So now we push event data over multiple db connections.

We moved to 32-CPU, 128GB RAM, 10Gbit server now. Preliminary results show that the server could handle over 500K rps. Unfortunately we have no 5M concurrent clients yet for testing such a load :(

Any idea? Perhaps provide some sample code for me to test with.

The rps seems too low for both net/http and fasthttp. Maybe your request handler is too heavy. See sample code from the pull request to TechEmpower benchmarks.

Aliaksandr Valialkin · Answer 3 · Sun Nov 29 2015 04:14:25 GMT+0800 (China Standard Time)

FYI, server process ate 10Gb of RAM when serving 1M concurrent connections, i.e. ~10Kb per connection, including memory required for pushing event data to db, memory fragmentation and GC overhead.

Anton Bacaj · Answer 4 · Sun Nov 29 2015 06:30:21 GMT+0800 (China Standard Time)

thanks for the example, it's useful 👍

Roman Kravchik · Answer 5 · Tue Nov 13 2018 23:32:35 GMT+0800 (China Standard Time)

My initial tests show there is alot of failures, with only 100 concurrents and 5 req per sec - throughput drops by 8% (unacceptable) and siege fails.

@abacaj, perhaps, those errors are caused by small pool and unlimited dialing (see golang/go#6785)
@valyala, do you have plan to limit max in-flight dialing?

Erik Dubbelboer · Answer 6 · Wed Nov 14 2018 01:36:06 GMT+0800 (China Standard Time)

@rkravchik judging from the first lines of your screenshot you aren't using keep-alive connections and your system/user ran our of available file descriptors or address:port combinations. If you're going to do a benchmark please do it properly.

Roman Kravchik · Answer 7 · Wed Nov 14 2018 01:58:13 GMT+0800 (China Standard Time)

@erikdubbelboer it was reply to the second post. Address your message to the proper reciepient.
Moreover, if you will read issue in golang repo you'll find why system may ran out of descriptors due to inproper behaviour.

Erik Dubbelboer · Answer 8 · Wed Nov 14 2018 10:07:06 GMT+0800 (China Standard Time)

@rkravchik I'm so sorry, apparently I wasn't awake yet and didn't notice you were quoting a previous comment.

You can use Client.MaxConnsPerHost to limit the max in-flight dialing.

Roman Kravchik · Answer 9 · Wed Nov 14 2018 20:54:20 GMT+0800 (China Standard Time)

@erikdubbelboer net/http/Transport also have:
MaxIdleConns int
MaxIdleConnsPerHost int
knobs.
But under some circumstances (described in golang/go#6785) there are many Dialing and that's why in go 1.11 have been added one more knob: MaxConnsPerHost.

As I can see in code Client.MaxConnsPerHost is a hard limit that caused ErrNoFreeConns error. And there is no way to have such problem that exist in net/http/ below go 1.11. Am I right?

Erik Dubbelboer · Answer 10 · Thu Nov 15 2018 14:03:23 GMT+0800 (China Standard Time)

Yes, the MaxConnsPerHost limit is applied before the connection is being dialed.

Roman Kravchik · Answer 11 · Thu Nov 15 2018 16:46:48 GMT+0800 (China Standard Time)

@erikdubbelboer thank you for your patience.