Suspicious time diff at https://perf.rust-lang.org/status.html in took/expected
klensy opened this issue · comments
Time displayed at https://perf.rust-lang.org/status.html for took/expected heavily differs for some runs, while perf result shows no difference.
For example, for run rust-lang/rust#115388 (comment)
Step | Took | Expected |
---|---|---|
await-call-tree | 0m28s | 0m28s |
bitmaps-3.1.0 | 1m06s | 0m59s |
cargo-0.60.0 | 8m24s | 5m23s |
clap-3.1.6 | 1m43s | 1m32s |
coercions | 0m56s | 0m54s |
cranelift-codegen-0.82.1 | 9m50s | 3m02s |
ctfe-stress-5 | 1m24s | 1m11s |
deeply-nested-multi | 2m43s | 0m32s |
deep-vector | 1m34s | 1m24s |
derive | 4m13s | 0m49s |
diesel-1.4.8 | 7m15s | 2m30s |
exa-0.10.1 | 3m00s | 3m00s |
externs | 0m38s | 0m39s |
helloworld | 0m37s | 0m37s |
helloworld-tiny | 0m11s | 0m11s |
html5ever-0.26.0 | 1m18s | 1m08s |
hyper-0.14.18 | 2m11s | 2m10s |
image-0.24.1 | 2m00s | 2m01s |
issue-46449 | 1m15s | 1m16s |
issue-58319 | 0m32s | 0m32s |
issue-88862 | 0m32s | 0m32s |
libc-0.2.124 | 1m01s | 1m02s |
many-assoc-items | 0m57s | 0m59s |
match-stress | 0m51s | 0m52s |
projection-caching | 0m41s | 0m41s |
regex-1.5.5 | 2m58s | 2m56s |
regression-31157 | 0m54s | 1m01s |
ripgrep-13.0.0 | 2m04s | 2m02s |
ripgrep-13.0.0-tiny | 1m05s | 1m04s |
serde-1.0.136 | 1m33s | 1m33s |
serde_derive-1.0.136 | 1m38s | 1m38s |
stm32f4-0.14.0 | 3m49s | 6m10s |
syn-1.0.89 | 1m28s | 1m24s |
token-stream-stress | 0m28s | 0m32s |
tt-muncher | 0m28s | 1m00s |
Notice cranelift, cargo, derive time diff for took/expected, while results clean https://perf.rust-lang.org/compare.html?start=b1b244da6527cf2ba36e88d02275f4c64a0c90d8&end=24259321f2e7a82959b47b86ded3d1073f281746&stat=instructions:u
And next perf for 6ff94474e1d11 (rust-lang/rust#115391 (comment)) run bounces back:
Step | Took | Expected |
---|---|---|
await-call-tree | 0m30s | 0m28s |
bitmaps-3.1.0 | 1m02s | 1m06s |
cargo-0.60.0 | 5m26s | 8m24s |
clap-3.1.6 | 1m32s | 1m43s |
coercions | 0m55s | 0m56s |
cranelift-codegen-0.82.1 | 3m04s | 9m50s |
ctfe-stress-5 | 1m12s | 1m24s |
deeply-nested-multi | 0m33s | 2m43s |
deep-vector | 1m25s | 1m34s |
derive | 0m51s | 4m13s |
diesel-1.4.8 | 2m32s | 7m15s |
exa-0.10.1 | 2m59s | 3m00s |
externs | 0m38s | 0m38s |
helloworld | 0m37s | 0m37s |
helloworld-tiny | 0m11s | 0m11s |
html5ever-0.26.0 | 1m09s | 1m18s |
hyper-0.14.18 | 2m10s | 2m11s |
image-0.24.1 | 2m02s | 2m00s |
issue-46449 | 1m16s | 1m15s |
issue-58319 | 0m33s | 0m32s |
issue-88862 | 0m33s | 0m32s |
libc-0.2.124 | 1m01s | 1m01s |
many-assoc-items | 0m58s | 0m57s |
match-stress | 0m52s | 0m51s |
projection-caching | 0m41s | 0m41s |
regex-1.5.5 | 2m56s | 2m58s |
regression-31157 | 0m55s | 0m54s |
ripgrep-13.0.0 | 2m03s | 2m04s |
ripgrep-13.0.0-tiny | 1m05s | 1m05s |
rustc | 7m45s | 13m12s |
serde-1.0.136 | 1m37s | 1m33s |
serde_derive-1.0.136 | 1m41s | 1m38s |
stm32f4-0.14.0 | 3m50s | 3m49s |
syn-1.0.89 | 1m26s | 1m28s |
token-stream-stress | 0m28s | 0m28s |
tt-muncher | 0m45s | 0m44s |
tuple-stress | 0m58s | 0m56s |
ucd | 1m04s | 1m03s |
unicode-normalization-0.1.19 | 0m52s | 0m51s |
unify-linearly | 0m39s | 0m38s |
unused-warnings | 1m00s | 1m01s |
webrender-2022 | 3m46s | 3m46s |
wf-projection-stress-65510 | 0m31s | 0m31s |
wg-grammar | 0m54s | 0m54s |
Quite sus number diff, while nothing interesting can be seen in perf results.
Yeph, in first message there second table, where it can be seen:
cranelift-codegen-0.82.1 | 3m04s | 9m50s |
---|
Both runs was try runs, so possible difference in try/post-merge runs can't be here.
The second table still has the estimate at 9m50s, while I'm saying the estimate is now fixed.
Some try runs may have been slow and I also indeed would like to know whether we use them for the time estimates: it seems possible if we're actually using the "last run", but I don't know for sure. We'll see when jakub is back from vacation.
Looking at digits, Expected
numbers is really taken from latest run:
cranelift-codegen-0.82.1 | 9m50s | 3m02s |
---|
cranelift-codegen-0.82.1 | 3m04s | 9m50s |
---|
We can try to check source code to dig possible truth :-)
It's my recollection that we're using the last run but wasn't sure if try builds were used or not -- and as you say it seems like they are ^^
Yeah, it's always estimated from the most recent run. We should probably ignore try runs though, I'll look into it.
Yeah, it's always estimated from the most recent run. We should probably ignore try runs though, I'll look into it.
This probably shouldn't be fixed until reason why this difference shows up will be found. Or maybe didn't changed at all, with current behavior we can at least saw difference and start digging; without that - noone will catch this.