rust-lang / rustc-perf

Website for graphing performance of rustc

Home Page:https://perf.rust-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider passing -Awarnings (or similar) to avoid false alarms from lint *reporting*

pnkfelix opened this issue · comments

See rust-lang/rust#117772 (comment)

There were a number of benchmarks that starting hitting a lint and thus starting exercising lint reporting machinery that was previously uninvoked.

Now: If a lint check is expensive to exercise, then we should measure that as part of benchmarking.

But if a new lint fires and causes new diagnostic code paths to run, that sounds like noise to me.

It would be better, IMO, if we silenced the diagnostic machinery from running at all in such cases, except for isolated benchmarks that are themselves dedicated to exercising the efficiency of the diagnostic machinery.

What do others think?

(If the -A flag itself may cause micro-optimizations that cause a given lint to be skipped entirely, then that would defeat the purpose of what I seek here, and I would in that scenario propose a new -Z flag for silencing the diagnostic reporting alone without making any other effect take place...)

I don't think that we do some short-circuit on allowed lints, although there have been some unsuccessful attempts to do that in the past, IIRC.

Is it such a big problem in practice that sometimes new lints cause the diagnostics to be triggered? It probably only happens in the stress tests.

Disabling the output completely could potentially hide regressions or something bad happening in the output itself.

I'm inclined to keep our compilation pipeline as vanilla as possible, i.e. no special flags.

Now: If a lint check is expensive to exercise, then we should measure that as part of benchmarking.

But if a new lint fires and causes new diagnostic code paths to run, that sounds like noise to me.

I'm not sure I understand the difference between these two cases. But if a new lint is added and it causes a benchmark to slow down, I think that's a useful signal. We may decide that the slowdown is worth it. I wouldn't want the slowdown to be hidden.

I guess the case I'm worried about is our benchmarks getting a significant hit due to diagnostic I/O overhead.

Here's my mental model: if a lint is added, and its firing on someone's code, and the added overhead is roughly proportional to the amount of diagnostic output generated (and the amount of extra output is roughly proportional to the number of violations of the lint in the underlying code, perhaps in a post macro-expanded form), then I would consider that to be something you could consider amortized : yes, the compilation has slowed down, but that extra cost is presumably being "paid off" by the value of whatever this lint is flagging about the underlying code.

On the other hand: If a lint is added, and the actual act of running the check is costly, and thus there is an overhead that is paid even for code that doesn't see output from the lint, then that should not be considered as being paid of in the same manner described above.


So, here's a throwaway thought: Could we pipe the stderr output to /dev/null during benchmarking? Or have a -Z flag that had some similar effect, but solel for the lint diagnostic output? Could that at all help to address the distinction I'm trying to draw above?

It really is a throwaway thought, and I'm tempted to just close this issue since I can tell that my ideas here are not truly fully formed. And I agree with @nnethercote that this isn't "noise" in the usual sense that we use it for spurious regressions...

Hmm, now that you mention it, why don't we already pipe stdout/stderr to /dev/null 🤔 @Mark-Simulacrum do you remember any justification for this? Seems like that could also slightly reduce noise. If the benchmark fails, we could re-run it again with piped stdout/stderr to find the compilation error.

It makes it super painful to debug actual failure if we swallow output, especially since that means debugging can't be done purely from logs visible on the public page.

Oh, I stopped reading. Anyway, I think it would be ok to rerun to actually get the output.. but it feels like needless complexity, and I'd be worried about losing coverage (e.g., imagine tty checking starts happening on every character in some slow way). But I'm not too worried about it.

Overall, historically we've just ignored these one-off bumps. They do affect historical/long-term metrics views but those are far less important than the commit -by-commit comparison anyway.