Consider passing -Awarnings (or similar) to avoid false alarms from lint reporting

Question

Consider passing -Awarnings (or similar) to avoid false alarms from lint reporting

pnkfelix opened this issue 3 months ago · comments

There were a number of benchmarks that starting hitting a lint and thus starting exercising lint reporting machinery that was previously uninvoked.

Now: If a lint check is expensive to exercise, then we should measure that as part of benchmarking.

But if a new lint fires and causes new diagnostic code paths to run, that sounds like noise to me.

It would be better, IMO, if we silenced the diagnostic machinery from running at all in such cases, except for isolated benchmarks that are themselves dedicated to exercising the efficiency of the diagnostic machinery.

What do others think?

Felix S Klock II · Answer 1 · Wed Feb 21 2024 00:09:08 GMT+0800 (China Standard Time)

(If the -A flag itself may cause micro-optimizations that cause a given lint to be skipped entirely, then that would defeat the purpose of what I seek here, and I would in that scenario propose a new -Z flag for silencing the diagnostic reporting alone without making any other effect take place...)

Jakub Beránek · Answer 2 · Thu Feb 22 2024 03:27:55 GMT+0800 (China Standard Time)

I don't think that we do some short-circuit on allowed lints, although there have been some unsuccessful attempts to do that in the past, IIRC.

Is it such a big problem in practice that sometimes new lints cause the diagnostics to be triggered? It probably only happens in the stress tests.

Disabling the output completely could potentially hide regressions or something bad happening in the output itself.

Nicholas Nethercote · Answer 3 · Thu Feb 22 2024 06:42:11 GMT+0800 (China Standard Time)

I'm inclined to keep our compilation pipeline as vanilla as possible, i.e. no special flags.

Now: If a lint check is expensive to exercise, then we should measure that as part of benchmarking.

But if a new lint fires and causes new diagnostic code paths to run, that sounds like noise to me.

I'm not sure I understand the difference between these two cases. But if a new lint is added and it causes a benchmark to slow down, I think that's a useful signal. We may decide that the slowdown is worth it. I wouldn't want the slowdown to be hidden.

Felix S Klock II · Answer 4 · Wed Mar 13 2024 10:02:06 GMT+0800 (China Standard Time)

I guess the case I'm worried about is our benchmarks getting a significant hit due to diagnostic I/O overhead.

Here's my mental model: if a lint is added, and its firing on someone's code, and the added overhead is roughly proportional to the amount of diagnostic output generated (and the amount of extra output is roughly proportional to the number of violations of the lint in the underlying code, perhaps in a post macro-expanded form), then I would consider that to be something you could consider amortized : yes, the compilation has slowed down, but that extra cost is presumably being "paid off" by the value of whatever this lint is flagging about the underlying code.

On the other hand: If a lint is added, and the actual act of running the check is costly, and thus there is an overhead that is paid even for code that doesn't see output from the lint, then that should not be considered as being paid of in the same manner described above.

So, here's a throwaway thought: Could we pipe the stderr output to /dev/null during benchmarking? Or have a -Z flag that had some similar effect, but solel for the lint diagnostic output? Could that at all help to address the distinction I'm trying to draw above?

It really is a throwaway thought, and I'm tempted to just close this issue since I can tell that my ideas here are not truly fully formed. And I agree with @nnethercote that this isn't "noise" in the usual sense that we use it for spurious regressions...

Jakub Beránek · Answer 5 · Thu Mar 14 2024 05:31:25 GMT+0800 (China Standard Time)

Hmm, now that you mention it, why don't we already pipe stdout/stderr to /dev/null 🤔 @Mark-Simulacrum do you remember any justification for this? Seems like that could also slightly reduce noise. If the benchmark fails, we could re-run it again with piped stdout/stderr to find the compilation error.

Mark Rousskov · Answer 6 · Thu Mar 14 2024 07:49:14 GMT+0800 (China Standard Time)

It makes it super painful to debug actual failure if we swallow output, especially since that means debugging can't be done purely from logs visible on the public page.

Mark Rousskov · Answer 7 · Thu Mar 14 2024 07:51:37 GMT+0800 (China Standard Time)

Oh, I stopped reading. Anyway, I think it would be ok to rerun to actually get the output.. but it feels like needless complexity, and I'd be worried about losing coverage (e.g., imagine tty checking starts happening on every character in some slow way). But I'm not too worried about it.

Overall, historically we've just ignored these one-off bumps. They do affect historical/long-term metrics views but those are far less important than the commit -by-commit comparison anyway.

Consider passing -Awarnings (or similar) to avoid false alarms from lint *reporting*

Consider passing -Awarnings (or similar) to avoid false alarms from lint reporting