hyperledger-archives / iroha

Iroha - A simple, decentralized ledger

Home Page:http://iroha.tech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Iroha performance

truongnmt opened this issue · comments

I setup an scenario to test Iroha performance. This is my environment spec:

  • AWS t3.small instance: 2 vCPUs, 2.5 GHz, Intel Skylake P-8175, 2 GiB memory
  • Iroha docker develop tag
  • 1 host with 4 peers (nodes)
  • Python SDK with Flask server API to hander user request

I setup a scenario using JMeter. 300 threads (user) will send 1 create user request. And the result is very bad, error rate 95%.

Here is my 4 peers logs:
https://gist.github.com/truongnmt/83a179c3cb83f5830f9ac709238eaf13

Also I see a framework to test for blockchain is Caliper, install everything but on the part running benchmark is so confusing.

I just tuning some parameter here is what I got.

First attempt:

max_proposal_size: 50
proposal_delay: 7000
vote_delay: 100
300 threads, ramp-up 5 seconds (meaning that it take 5 seconds for all thread to send request. So roughly ~60 tx/s)
Error rate: 96%

screenshot from 2018-10-03 11-22-24

Second attempt:

max_proposal_size: 200
proposal_delay: 7000
vote_delay: 100
300 threads, ramp-up 5 seconds
Error rate: 96%

screenshot from 2018-10-03 12-58-23

Third attempt:

max_proposal_size: 300
proposal_delay: 10000
vote_delay: 100
300 threads, ramp-up 5 seconds
Error rate: 97%

screenshot from 2018-10-03 13-29-03

Fourth attempt:

max_proposal_size: 300
proposal_delay: 1000
vote_delay: 100
300 threads, ramp-up 5 seconds
Error rate: 70.47%

screenshot from 2018-10-03 13-51-56

Fifth attempt:

max_proposal_size: 300
proposal_delay: 500
vote_delay: 100
300 threads, ramp-up 5 seconds
Error rate: 35%

screenshot from 2018-10-03 14-02-33

No.6 attempt:

max_proposal_size: 300
proposal_delay: 100
vote_delay: 100
300 threads, ramp-up 5 seconds
Error rate: 81.54% :(

screenshot from 2018-10-03 14-11-15

Notice that on No. 6 attempt, I got error:

[2018-10-03 07:06:12.498330740][th:1261][warning] BlockLoaderImpl Block not found
[2018-10-03 07:06:12.499745529][th:1261][error] YacGate Could not get block from block loader

After this error, Iroha no recieve any request, even I restart Iroha instance, it said:

[2018-10-03 07:15:19.669143938][th:998][info] AsyncGrpcClient transactions in proposal: 1
[2018-10-03 07:15:19.669978975][th:998][info] OrderingGate Received new proposal, height: 864

It keep saying that and increasing height after I send request again.

commented

That amazing research you've done, thank you!
But still there's some lack of information to conclude and understand the core of the issues.

1 host with 4 peers (nodes)

Am I correct that you've launched one aws instance and 4 iroha peers on it? That may cause thread scheduling, memory exhausting and numerous of other issues.
Also where was the client launched? (on the same aws instance or outside)

I setup a scenario using JMeter ... error rate 95%

I'm not familiar with that tool. Could you explain what does "error rate" means in this particular context? Dropped network packets, cancels on OSI application level or smth else?

Any other additional info might be really helpful, so feel free anything other info (even if you consider is barely useful)!

Sorry for my late response.

1 host with 4 peers (nodes)

Yes one aws instance and 4 iroha peers. I setup JMeter (client) on my localhost. In AWS instance I setup Iroha and Iroha Python SDK with Flask server API to hander user request.

I did a test with 4 AWS instances, each has 1 peer. But I didn't notice any different in comparison with 1 instance 4 peers.

I think the reason is this. During a heavy load, due to the waiting time too long (either because of max_proposal_size or proposal_delay), most of the request go timeout. While the time to process transactions is very fast. With the default config:

"max_proposal_size" : 10,
"proposal_delay" : 5000,
"vote_delay" : 5000,
"load_delay" : 5000

I using Postman to send request to server, it takes 10s to complete. With 100 concurrent user, timeout is inevitable. So with this config:

max_proposal_size: 300
proposal_delay: 500
vote_delay: 100,
load_delay: 5000

The request only take 600-800ms @@. That's nut!

About the JMeter, that is a tool to test web performance. So I setup an scenario that 300 user will create account. Just provide params, host web server IP and it will run it for you. Success request will return status 200. The rest (501, 502 ...) is fail. So the error rate is the percent of failure request in total.

max_proposal_size: 300
proposal_delay: 500
vote_delay: 100,
load_delay: 5000

As I running with this params value for a while, I just got this bug without any peer shut down. Maybe vote_delay too fast that it haven't appear on another peer yet?

screenshot-1

Hi, regarding any performance numbers, we are currently in the process of optimization transactions.
Currently, we do not have a precise number to show. Latest dev branch can have about 300 tx/sec of throughput.

Regarding your benchmark, could you please specify how many transactions/sec did you send to iroha? Did you send them to a single peer or a whole network?

Regarding the bug you encountered, we are currently looking into it, please expect fixes to be in the dev branch soon.

@nickaleks Could you send me a configuration for 300 tx/sec?
how many peers, hosts, config.docker file...etc

Unfortunately, this number is not confirmed right now. As soon as we have a benchmark to show, It will be published.

Again, reading about the configuration tips and try to implement as suggested: raise max_proposal_size and proposal_delay on handle a lot of transactions.

Here is my setting:
max_proposal_size: 400
proposal_delay: 5000
vote_delay: 400

Here is the result with an individual request using Postman: 6-9s.

Using jmeter tool, I sent 400 transactions in 1 second and here is the result:
screen shot 2018-10-25 at 17 20 10

95% of the transactions failed. If I understand correctly, max_proposal_size: 400 and
proposal_delay: 5000 means that on receiving >400 transactions or after 5s, all transactions will be processed at once. I wonder why first 268 transactions failed already? It should have wait 🤔

image

Regarding your benchmark, could you please specify how many transactions/sec did you send to iroha? Did you send them to a single peer or a whole network?

I sent total 400 transactions in 1 second. I have 4 peers and I sent randomly between 4 peers.

Are you sure your transactions are correct? What is the response status of those transactions?

Yes I'm sure all transactions are correct. I create random user name so no one has the same name. Here is a sample response:

Thread Name: Create user 1-157
Sample Start: 2018-10-25 17:18:54 ICT
Load time: 384
Connect Time: 192
Latency: 384
Size in bytes: 348
Sent bytes:163
Headers size in bytes: 166
Body size in bytes: 182
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""): text
Response code: 502
Response message: Bad Gateway


HTTPSampleResult fields:
ContentType: text/html
DataEncoding: null

All failed transactions has the same response, 502 Bad gateway.

Here is error log on nginx: /var/log/nginx/error.log

2018/10/25 11:16:11 [error] 31138#31138: *4406 connect() to unix:/home/ubuntu/iroha/app.sock failed
(11: Resource temporarily unavailable) while connecting to upstream, client: <client_request_ip>,
server: <host IP>, request: "GET <api_link> HTTP/1.1", upstream:
"http://unix:/home/ubuntu/iroha/app.sock:<api_link>", host: "<host IP>"

Oh I think I figured it out, wait me a second @@

I think I could solve half of the problem.

I wonder why first 268 transactions failed already? It should have wait 🤔
image

So I inscrease the value of somaxconn. Simply put, the somaxconn is the maximum number of queued connections we want on a socket. This somaxconn specifies how long we want this line to be. If more clients attempt to connect to our server, more than the backlog, those connections will be dropped.

=> $ sudo nano /proc/sys/net/core/somaxconn and increase from 128 to 20000.

So all connections will be placed on queue to be processed. This is what I got:

screen shot 2018-10-27 at 15 13 08

If my understanding is true, why I send all 400 transaction at once but Iroha only return result 3 requests each time, 3 requests on 11s, 3 requests on 25s and 3 requests on 39s. I think I should have result of all 400 requests at once, since all of them are processed in one chunk? 🤔

commented

This is being worked on :) https://jira.hyperledger.org/browse/IR-17 - here's the link

Appriciated! Keep up the hard work! 💪💪💪