scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra

Home Page:http://scylladb.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

perf_simple_query - tps and tasks_per_op regression

roydahan opened this issue · comments

perf_simple_query results from begining of may shows regression in tps and tasks_per_op:

Test results

commit_id date run_date_time version allocs_per_op instructions_per_op mad tps max tps median tps min tps tasks_per_op
af56742 20240501 2024-05-05 06:17:16 5.5.0~dev 63.07 42233.56 1189.49 157216.74 156027.25 136553.62 14.12

Median absolute deviation percentage: 0.76%

last 10 Scylla builds dates results

commit_id date run_date_time version allocs_per_op instructions_per_op mad tps max tps median tps min tps tasks_per_op
d8313dd 20240427 2024-04-28 06:09:56 5.5.0~dev 63.07(0.0%) 42245.21(-0.03%) 1167.92 161519.25(-2.66%) 160134.1(-2.56%) 137228.49(-0.49%) 14.12(0.02%)
65cfb9b 20240420 2024-04-21 06:11:09 5.5.0~dev 62.06(1.62%) 41850.36(0.92%) 663.78 181138.98(-13.21%) 178365.74(-12.52%) 158088.03(-13.62%) 13.11(7.73%)
0be61e5 20240411 2024-04-14 06:10:12 5.5.0~dev 62.06(1.63%) 41919.16(0.75%) 3361.33 187892.73(-16.33%) 184531.4(-15.45%) 158260.79(-13.72%) 13.1(7.81%)
0c74c2c 20240405 2024-04-07 06:10:04 5.5.0~dev 62.05(1.65%) 41926.32(0.73%) 1287.74 193580.23(-18.78%) 191967.91(-18.72%) 170937.59(-20.11%) 13.09(7.89%)
885cb2a 20240329 2024-03-31 06:11:01 5.5.0~dev 62.07(1.62%) 41842.21(0.94%) 638.56 171304.21(-8.22%) 170665.65(-8.58%) 160497.62(-14.92%) 13.12(7.67%)
6bd0be7 20240327 2024-03-28 20:31:38 5.5.0~dev 62.06(1.62%) 41833.61(0.96%) 702.4 181907.62(-13.57%) 180613.89(-13.61%) 152158.16(-10.26%) 13.11(7.74%)
101fdfc 20240326 2024-03-27 10:25:26 5.5.0~dev 62.07(1.61%) 41859.02(0.89%) 459.65 169157.99(-7.06%) 167336.86(-6.76%) 141499.43(-3.5%) 13.12(7.65%)

Links:

An intial biscetion and investigation was done by @michoecho:

git bisect good 65cfb9b4e088
git bisect bad d8313dda43d7
cat >bisect.sh <<'EOF'
git submodule update --init --recursive --jobs=10
ninja build/release/scylla || ninja $(realpath build/release/scylla)
build/release/scylla perf-simple-query --smp=1 2>/dev/null | awk '/median/{exit int($7 > 14)}'
EOF
git bisect run bash bisect.sh
...
3a34bb18cd2207ff51ff0053fc13235848cffd25 is the first bad commit
commit 3a34bb18cd2207ff51ff0053fc13235848cffd25
Author: Patryk Jędrzejczak <[patryk.jedrzejczak@scylladb.com](mailto:patryk.jedrzejczak@scylladb.com)>
Date:   Tue Apr 2 12:37:20 2024 +0200

    db: config: make consistent-topology-changes unused

    We make the `consistent-topology-changes` experimental feature
    unused and assumed to be true in 6.0. We remove code branches that
    executed if `consistent-topology-changes` was disabled.

The full discussion about the regression investigation can be found here:
https://groups.google.com/a/scylladb.com/g/scylla-perf-results/c/1TMTpovVvSo/m/1AXhK44-AAAJ?utm_medium=email&utm_source=footer

From @michoecho's investigation, it's caused by apply_fence.
seastar::continuation<seastar::internal::promise_base_with_type<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}, seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::then_wrapped_nrvo<seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}>(service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&, service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<auto:1>, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}&, seastar::future_state<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}, seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::run_and_dispose
image

Citing some comments from the discussion:

(If this wasn't clear, note that the "bad patch" doesn't actually add work to a real workload — it only makes perf_simple_query aware of the fencing).

Avoiding continuations isn't easy, it's just how composition in Seastar works. If you want to avoid a continuation, you basically have to manually inline it.

The TPS regression is much larger than the instruction overhead. Maybe it's not real, but at least it should be investigated.

The extra instruction/task/allocation may be tolerable and unavoidable, but it's also possible to recover some of it.

If the change in TPS is greater than the change in instructions per op, then this (obviously) means that the IPC is lower.
Analyzing IPC is a wyższa szkoła jazdy even at the macro level, let alone at the level of 300 instructions.

Could also be that we're not running at 100% utilization any more.
Maybe we won't figure it out, but we have to try.

@avikivity do you consider this a release blocker?

@avikivity do you consider this a release blocker?

No. We may even declare it unavoidable (but I hope we can avoid it).

I put it in Q2 plan anyway, we might be able to fix it for 6.0 even after we branch, if we deal with all the blockers.