perf_simple_query - tps and tasks_per_op regression

Question

perf_simple_query - tps and tasks_per_op regression

roydahan opened this issue a month ago · comments

perf_simple_query results from begining of may shows regression in tps and tasks_per_op:

Test results

commit_id	date	run_date_time	version	allocs_per_op	instructions_per_op	mad tps	max tps	median tps	min tps	tasks_per_op
`af56742`	20240501	2024-05-05 06:17:16	5.5.0~dev	63.07	42233.56	1189.49	157216.74	156027.25	136553.62	14.12

Median absolute deviation percentage: 0.76%

last 10 Scylla builds dates results

commit_id	date	run_date_time	version	allocs_per_op	instructions_per_op	mad tps	max tps	median tps	min tps	tasks_per_op
`d8313dd`	20240427	2024-04-28 06:09:56	5.5.0~dev	63.07(0.0%)	42245.21(-0.03%)	1167.92	161519.25(-2.66%)	160134.1(-2.56%)	137228.49(-0.49%)	14.12(0.02%)
`65cfb9b`	20240420	2024-04-21 06:11:09	5.5.0~dev	62.06(1.62%)	41850.36(0.92%)	663.78	181138.98(-13.21%)	178365.74(-12.52%)	158088.03(-13.62%)	13.11(7.73%)
`0be61e5`	20240411	2024-04-14 06:10:12	5.5.0~dev	62.06(1.63%)	41919.16(0.75%)	3361.33	187892.73(-16.33%)	184531.4(-15.45%)	158260.79(-13.72%)	13.1(7.81%)
`0c74c2c`	20240405	2024-04-07 06:10:04	5.5.0~dev	62.05(1.65%)	41926.32(0.73%)	1287.74	193580.23(-18.78%)	191967.91(-18.72%)	170937.59(-20.11%)	13.09(7.89%)
`885cb2a`	20240329	2024-03-31 06:11:01	5.5.0~dev	62.07(1.62%)	41842.21(0.94%)	638.56	171304.21(-8.22%)	170665.65(-8.58%)	160497.62(-14.92%)	13.12(7.67%)
`6bd0be7`	20240327	2024-03-28 20:31:38	5.5.0~dev	62.06(1.62%)	41833.61(0.96%)	702.4	181907.62(-13.57%)	180613.89(-13.61%)	152158.16(-10.26%)	13.11(7.74%)
`101fdfc`	20240326	2024-03-27 10:25:26	5.5.0~dev	62.07(1.61%)	41859.02(0.89%)	459.65	169157.99(-7.06%)	167336.86(-6.76%)	141499.43(-3.5%)	13.12(7.65%)

Links:

An intial biscetion and investigation was done by @michoecho:

git bisect good 65cfb9b4e088
git bisect bad d8313dda43d7
cat >bisect.sh <<'EOF'
git submodule update --init --recursive --jobs=10
ninja build/release/scylla || ninja $(realpath build/release/scylla)
build/release/scylla perf-simple-query --smp=1 2>/dev/null | awk '/median/{exit int($7 > 14)}'
EOF
git bisect run bash bisect.sh
...
3a34bb18cd2207ff51ff0053fc13235848cffd25 is the first bad commit
commit 3a34bb18cd2207ff51ff0053fc13235848cffd25
Author: Patryk Jędrzejczak <[patryk.jedrzejczak@scylladb.com](mailto:patryk.jedrzejczak@scylladb.com)>
Date:   Tue Apr 2 12:37:20 2024 +0200

    db: config: make consistent-topology-changes unused

    We make the `consistent-topology-changes` experimental feature
    unused and assumed to be true in 6.0. We remove code branches that
    executed if `consistent-topology-changes` was disabled.

The full discussion about the regression investigation can be found here:
https://groups.google.com/a/scylladb.com/g/scylla-perf-results/c/1TMTpovVvSo/m/1AXhK44-AAAJ?utm_medium=email&utm_source=footer

Kamil Braun · Answer 1 · Thu May 09 2024 19:01:23 GMT+0800 (China Standard Time)

From @michoecho's investigation, it's caused by apply_fence.
seastar::continuation<seastar::internal::promise_base_with_type<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}, seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::then_wrapped_nrvo<seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}>(service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&, service::storage_proxy::apply_fence<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >(seastar::future<auto:1>, service::fencing_token, gms::inet_address) const::{lambda(seastar::future<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}&, seastar::future_state<seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >&&)#1}, seastar::rpc::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, cache_temperature> >::run_and_dispose

Citing some comments from the discussion:

(If this wasn't clear, note that the "bad patch" doesn't actually add work to a real workload — it only makes perf_simple_query aware of the fencing).

Avoiding continuations isn't easy, it's just how composition in Seastar works. If you want to avoid a continuation, you basically have to manually inline it.

The TPS regression is much larger than the instruction overhead. Maybe it's not real, but at least it should be investigated.

The extra instruction/task/allocation may be tolerable and unavoidable, but it's also possible to recover some of it.

If the change in TPS is greater than the change in instructions per op, then this (obviously) means that the IPC is lower.
Analyzing IPC is a wyższa szkoła jazdy even at the macro level, let alone at the level of 300 instructions.

Could also be that we're not running at 100% utilization any more.
Maybe we won't figure it out, but we have to try.

Kamil Braun · Answer 2 · Mon May 13 2024 16:43:42 GMT+0800 (China Standard Time)

@avikivity do you consider this a release blocker?

Avi Kivity · Answer 3 · Mon May 13 2024 18:55:52 GMT+0800 (China Standard Time)

@avikivity do you consider this a release blocker?

No. We may even declare it unavoidable (but I hope we can avoid it).

Kamil Braun · Answer 4 · Mon May 13 2024 19:28:50 GMT+0800 (China Standard Time)

I put it in Q2 plan anyway, we might be able to fix it for 6.0 even after we branch, if we deal with all the blockers.