Suspicious time diff at https://perf.rust-lang.org/status.html in took/expected

Question

Suspicious time diff at https://perf.rust-lang.org/status.html in took/expected

klensy opened this issue 9 months ago · comments

Time displayed at https://perf.rust-lang.org/status.html for took/expected heavily differs for some runs, while perf result shows no difference.

For example, for run rust-lang/rust#115388 (comment)

Step	Took	Expected
await-call-tree	0m28s	0m28s
bitmaps-3.1.0	1m06s	0m59s
cargo-0.60.0	8m24s	5m23s
clap-3.1.6	1m43s	1m32s
coercions	0m56s	0m54s
cranelift-codegen-0.82.1	9m50s	3m02s
ctfe-stress-5	1m24s	1m11s
deeply-nested-multi	2m43s	0m32s
deep-vector	1m34s	1m24s
derive	4m13s	0m49s
diesel-1.4.8	7m15s	2m30s
exa-0.10.1	3m00s	3m00s
externs	0m38s	0m39s
helloworld	0m37s	0m37s
helloworld-tiny	0m11s	0m11s
html5ever-0.26.0	1m18s	1m08s
hyper-0.14.18	2m11s	2m10s
image-0.24.1	2m00s	2m01s
issue-46449	1m15s	1m16s
issue-58319	0m32s	0m32s
issue-88862	0m32s	0m32s
libc-0.2.124	1m01s	1m02s
many-assoc-items	0m57s	0m59s
match-stress	0m51s	0m52s
projection-caching	0m41s	0m41s
regex-1.5.5	2m58s	2m56s
regression-31157	0m54s	1m01s
ripgrep-13.0.0	2m04s	2m02s
ripgrep-13.0.0-tiny	1m05s	1m04s
serde-1.0.136	1m33s	1m33s
serde_derive-1.0.136	1m38s	1m38s
stm32f4-0.14.0	3m49s	6m10s
syn-1.0.89	1m28s	1m24s
token-stream-stress	0m28s	0m32s
tt-muncher	0m28s	1m00s

Notice cranelift, cargo, derive time diff for took/expected, while results clean https://perf.rust-lang.org/compare.html?start=b1b244da6527cf2ba36e88d02275f4c64a0c90d8&end=24259321f2e7a82959b47b86ded3d1073f281746&stat=instructions:u

And next perf for 6ff94474e1d11 (rust-lang/rust#115391 (comment)) run bounces back:

Step	Took	Expected
await-call-tree	0m30s	0m28s
bitmaps-3.1.0	1m02s	1m06s
cargo-0.60.0	5m26s	8m24s
clap-3.1.6	1m32s	1m43s
coercions	0m55s	0m56s
cranelift-codegen-0.82.1	3m04s	9m50s
ctfe-stress-5	1m12s	1m24s
deeply-nested-multi	0m33s	2m43s
deep-vector	1m25s	1m34s
derive	0m51s	4m13s
diesel-1.4.8	2m32s	7m15s
exa-0.10.1	2m59s	3m00s
externs	0m38s	0m38s
helloworld	0m37s	0m37s
helloworld-tiny	0m11s	0m11s
html5ever-0.26.0	1m09s	1m18s
hyper-0.14.18	2m10s	2m11s
image-0.24.1	2m02s	2m00s
issue-46449	1m16s	1m15s
issue-58319	0m33s	0m32s
issue-88862	0m33s	0m32s
libc-0.2.124	1m01s	1m01s
many-assoc-items	0m58s	0m57s
match-stress	0m52s	0m51s
projection-caching	0m41s	0m41s
regex-1.5.5	2m56s	2m58s
regression-31157	0m55s	0m54s
ripgrep-13.0.0	2m03s	2m04s
ripgrep-13.0.0-tiny	1m05s	1m05s
rustc	7m45s	13m12s
serde-1.0.136	1m37s	1m33s
serde_derive-1.0.136	1m41s	1m38s
stm32f4-0.14.0	3m50s	3m49s
syn-1.0.89	1m26s	1m28s
token-stream-stress	0m28s	0m28s
tt-muncher	0m45s	0m44s
tuple-stress	0m58s	0m56s
ucd	1m04s	1m03s
unicode-normalization-0.1.19	0m52s	0m51s
unify-linearly	0m39s	0m38s
unused-warnings	1m00s	1m01s
webrender-2022	3m46s	3m46s
wf-projection-stress-65510	0m31s	0m31s
wg-grammar	0m54s	0m54s

Quite sus number diff, while nothing interesting can be seen in perf results.

Rémy Rakic · Answer 1 · Fri Sep 01 2023 03:35:41 GMT+0800 (China Standard Time)

cranelift is already back to 3 mins

klensy · Answer 2 · Fri Sep 01 2023 03:37:48 GMT+0800 (China Standard Time)

Yeph, in first message there second table, where it can be seen:

cranelift-codegen-0.82.1	3m04s	9m50s

klensy · Answer 3 · Fri Sep 01 2023 03:41:00 GMT+0800 (China Standard Time)

Both runs was try runs, so possible difference in try/post-merge runs can't be here.

Rémy Rakic · Answer 4 · Fri Sep 01 2023 03:42:51 GMT+0800 (China Standard Time)

The second table still has the estimate at 9m50s, while I'm saying the estimate is now fixed.

Some try runs may have been slow and I also indeed would like to know whether we use them for the time estimates: it seems possible if we're actually using the "last run", but I don't know for sure. We'll see when jakub is back from vacation.

klensy · Answer 5 · Fri Sep 01 2023 03:47:17 GMT+0800 (China Standard Time)

Looking at digits, Expected numbers is really taken from latest run:

cranelift-codegen-0.82.1	9m50s	3m02s

cranelift-codegen-0.82.1	3m04s	9m50s

And your linked times for cranelift: 3m08s 3m04s.

We can try to check source code to dig possible truth :-)

Rémy Rakic · Answer 6 · Fri Sep 01 2023 04:00:37 GMT+0800 (China Standard Time)

It's my recollection that we're using the last run but wasn't sure if try builds were used or not -- and as you say it seems like they are ^^

Jakub Beránek · Answer 7 · Fri Sep 01 2023 04:01:09 GMT+0800 (China Standard Time)

Yeah, it's always estimated from the most recent run. We should probably ignore try runs though, I'll look into it.

klensy · Answer 8 · Fri Sep 01 2023 06:21:39 GMT+0800 (China Standard Time)

Yeah, it's always estimated from the most recent run. We should probably ignore try runs though, I'll look into it.

This probably shouldn't be fixed until reason why this difference shows up will be found. Or maybe didn't changed at all, with current behavior we can at least saw difference and start digging; without that - noone will catch this.