rust-lang / rustc-perf

Website for graphing performance of rustc

Home Page:https://perf.rust-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suspicious time diff at https://perf.rust-lang.org/status.html in took/expected

klensy opened this issue · comments

Time displayed at https://perf.rust-lang.org/status.html for took/expected heavily differs for some runs, while perf result shows no difference.

For example, for run rust-lang/rust#115388 (comment)

Step Took Expected
await-call-tree 0m28s 0m28s
bitmaps-3.1.0 1m06s 0m59s
cargo-0.60.0 8m24s 5m23s
clap-3.1.6 1m43s 1m32s
coercions 0m56s 0m54s
cranelift-codegen-0.82.1 9m50s 3m02s
ctfe-stress-5 1m24s 1m11s
deeply-nested-multi 2m43s 0m32s
deep-vector 1m34s 1m24s
derive 4m13s 0m49s
diesel-1.4.8 7m15s 2m30s
exa-0.10.1 3m00s 3m00s
externs 0m38s 0m39s
helloworld 0m37s 0m37s
helloworld-tiny 0m11s 0m11s
html5ever-0.26.0 1m18s 1m08s
hyper-0.14.18 2m11s 2m10s
image-0.24.1 2m00s 2m01s
issue-46449 1m15s 1m16s
issue-58319 0m32s 0m32s
issue-88862 0m32s 0m32s
libc-0.2.124 1m01s 1m02s
many-assoc-items 0m57s 0m59s
match-stress 0m51s 0m52s
projection-caching 0m41s 0m41s
regex-1.5.5 2m58s 2m56s
regression-31157 0m54s 1m01s
ripgrep-13.0.0 2m04s 2m02s
ripgrep-13.0.0-tiny 1m05s 1m04s
serde-1.0.136 1m33s 1m33s
serde_derive-1.0.136 1m38s 1m38s
stm32f4-0.14.0 3m49s 6m10s
syn-1.0.89 1m28s 1m24s
token-stream-stress 0m28s 0m32s
tt-muncher 0m28s 1m00s

Notice cranelift, cargo, derive time diff for took/expected, while results clean https://perf.rust-lang.org/compare.html?start=b1b244da6527cf2ba36e88d02275f4c64a0c90d8&end=24259321f2e7a82959b47b86ded3d1073f281746&stat=instructions:u

And next perf for 6ff94474e1d11 (rust-lang/rust#115391 (comment)) run bounces back:

Step Took Expected
await-call-tree 0m30s 0m28s
bitmaps-3.1.0 1m02s 1m06s
cargo-0.60.0 5m26s 8m24s
clap-3.1.6 1m32s 1m43s
coercions 0m55s 0m56s
cranelift-codegen-0.82.1 3m04s 9m50s
ctfe-stress-5 1m12s 1m24s
deeply-nested-multi 0m33s 2m43s
deep-vector 1m25s 1m34s
derive 0m51s 4m13s
diesel-1.4.8 2m32s 7m15s
exa-0.10.1 2m59s 3m00s
externs 0m38s 0m38s
helloworld 0m37s 0m37s
helloworld-tiny 0m11s 0m11s
html5ever-0.26.0 1m09s 1m18s
hyper-0.14.18 2m10s 2m11s
image-0.24.1 2m02s 2m00s
issue-46449 1m16s 1m15s
issue-58319 0m33s 0m32s
issue-88862 0m33s 0m32s
libc-0.2.124 1m01s 1m01s
many-assoc-items 0m58s 0m57s
match-stress 0m52s 0m51s
projection-caching 0m41s 0m41s
regex-1.5.5 2m56s 2m58s
regression-31157 0m55s 0m54s
ripgrep-13.0.0 2m03s 2m04s
ripgrep-13.0.0-tiny 1m05s 1m05s
rustc 7m45s 13m12s
serde-1.0.136 1m37s 1m33s
serde_derive-1.0.136 1m41s 1m38s
stm32f4-0.14.0 3m50s 3m49s
syn-1.0.89 1m26s 1m28s
token-stream-stress 0m28s 0m28s
tt-muncher 0m45s 0m44s
tuple-stress 0m58s 0m56s
ucd 1m04s 1m03s
unicode-normalization-0.1.19 0m52s 0m51s
unify-linearly 0m39s 0m38s
unused-warnings 1m00s 1m01s
webrender-2022 3m46s 3m46s
wf-projection-stress-65510 0m31s 0m31s
wg-grammar 0m54s 0m54s

Quite sus number diff, while nothing interesting can be seen in perf results.

cranelift is already back to 3 mins

image

Yeph, in first message there second table, where it can be seen:

cranelift-codegen-0.82.1 3m04s 9m50s

Both runs was try runs, so possible difference in try/post-merge runs can't be here.

The second table still has the estimate at 9m50s, while I'm saying the estimate is now fixed.

Some try runs may have been slow and I also indeed would like to know whether we use them for the time estimates: it seems possible if we're actually using the "last run", but I don't know for sure. We'll see when jakub is back from vacation.

Looking at digits, Expected numbers is really taken from latest run:

cranelift-codegen-0.82.1 9m50s 3m02s
cranelift-codegen-0.82.1 3m04s 9m50s
And your linked times for cranelift: 3m08s 3m04s.

We can try to check source code to dig possible truth :-)

It's my recollection that we're using the last run but wasn't sure if try builds were used or not -- and as you say it seems like they are ^^

Yeah, it's always estimated from the most recent run. We should probably ignore try runs though, I'll look into it.

Yeah, it's always estimated from the most recent run. We should probably ignore try runs though, I'll look into it.

This probably shouldn't be fixed until reason why this difference shows up will be found. Or maybe didn't changed at all, with current behavior we can at least saw difference and start digging; without that - noone will catch this.