keycloak / keycloak-benchmark

Keycloak Benchmark

Home Page:https://www.keycloak.org/keycloak-benchmark/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Review performance change Credential Grants Per Second for external Infinispan

ahus1 opened this issue · comments

Description

There was a performance change in the response time from 9 ms to 12 ms as described in #901 (comment)

Discussion

No response

Motivation

The reason is yet unknown. It would be good to know the cause, and if it is simple enough to fix it.

Details

No response

Run a test with the global accelerator configured with a single endpoint (basically A/P but using the global accelerator instead of route 53).

The Gatling result shows a 12ms in the 50% percentile which is 3ms higher than the original A/P test (9ms, see #901)

gatling

Below is the 50% percentile in Grafana dashboard:

50_percentile

I'm confident that isn't a Keycloak issue due to the Grafana dashboard results. But it raises another question: Where is that 10ms spent?

Using Route53 (AP deployment), I can observe the smaller 50% percentile in Gatling, 9ms.

Screenshot from 2024-08-13 16-01-49

The Grafana dashboard metrics look more "flat" but the raw value is not that different from AA.

Screenshot from 2024-08-13 16-00-47

@pruivo - thank you for putting us on the track to identify the AWS Global Loadbalancer as the source to this. The second setup look a little bit flatter, but that might only be due to the first two warm-up calls, as it then is about 2.3 ms as the one above.

The difference between 2.3 ms and 9 ms is most likely establishing the TCP connection and then the TLS connection on top of it. The added extra 3 ms are then likely due to something extra that the Global Loadbalancer is doing.

So with all the evidence you collected, we don't have a regression in Keycloak after this change. The increased latency is then due to AWS Global Loabalancer, which might (or might not) add a hop between AZs. Looking at https://www.cloudping.co/grid, which indicates 3 ms inside of the eu-west-1 region, this seems to be a good match.

Unfortunately, neither with the reduced Gatling HTML report, nor with the original report we get response time histograms on Gatling (only percentiles), so we don't see if there is a "faster" and a "slower" half of requests.

So I'm closing this issue. Again, thanks for the thorough analysis.