proposal: avoid github actions known as currently broken

Question

proposal: avoid github actions known as currently broken

ssbarnea opened this issue 3 years ago · comments

In order to avoid confusing contributors and reviewers we should avoid enabling or keeping CI jobs that are known as broken.

The only exception to this rule should be if someone is already actively working on fixing these.

If a job is broken for more than a week or two, we should just disable the job and eventually document that use-case as unsupported, at least until we can make it green again.

Reasoning

When looking at list of open pull request, anyone should be able to see the green-checkmark for changes that passed CI. Once this happens they can review them.

If a project has known-as-broken jobs, all pull-requests will report a red-check, making much harder to identify those that are ready for review or not. If the project happens to have more than 8-10 CI jobs, it can become even harder for the reviewer to identify if a genuinely required job failed or if only the known failures caused the mark.

As seen in screenshot example below none of the pending pull-requests have a green checkmark on them because that project has CI jobs known to be broken. That is making reviewing less likely to happen and reviewer time is limited and he is not expected to investigate each exiting PR to see if it was updated and if only broken jobs are causing the red mark on it.

Other options

If we identify a way to still run broken jobs but avoid changing the final mark from green to red we should keep them. Sadly, AFAIK that is not possible with GHA at this time. Even if we skip jobs, the final mark will be red.

Ganesh Nalawade · Answer 1 · Mon Sep 27 2021 21:54:31 GMT+0800 (China Standard Time)

+1 for this proposal, I am in favour of removing GHA failing for known reason until we fix it (there can be an issue raised to track it) to avoid confusion and save PR submitters time to try to figure out why the job is failing.

Bradley A. Thornton · Answer 2 · Mon Sep 27 2021 23:38:38 GMT+0800 (China Standard Time)

+0...... If the job was once working, it should be fixed or the change that broke the job reverted. If the job never worked, it shouldn't have been enabled.

IMO, avoiding "known-as-broken" jobs entirely might be the ultimate fix.

Sorin Sbarnea · Answer 3 · Mon Sep 27 2021 23:55:37 GMT+0800 (China Standard Time)

@cidrblock Windows building never worked as we did not had any CI jobs. It was added as an emergency/fast-tracked change ansible/ansible-language-server#18 just before we both went on vacation, as we wanted to ensure that what comes while we were drinking margaritas was not going to give us more headaches when back.

If you read the comments on the change you will see that I was against adding it with few other remarks. Still, I did approve and merged it with @webknjaz because it was far worse to have no CI at all.

We will discuss it in our next team meeting but it would be very useful if everyone does the research regarding pros/cons before the meeting.

Bradley A. Thornton · Answer 4 · Tue Sep 28 2021 02:13:17 GMT+0800 (China Standard Time)

@ssbarnea I may have missed this was concerning a specific issue/PR and/or repo in the explanation above... My bad.

Sviatoslav Sydorenko (Святослав Сидоренко) · Answer 5 · Thu Sep 30 2021 00:15:10 GMT+0800 (China Standard Time)

I'm "-2" on this. At least, on removing the jobs entirely. And here's why:

In order to avoid confusing contributors and reviewers we should avoid enabling or keeping CI jobs that are known as broken.

Did any contributor get confused by this? What was the case? It looks like you're suggesting a solution to a different problem or maybe you're mixing up several cases that seem related to you (see below).

If a job is broken for more than a week or two, we should just disable the job and eventually document that use-case as unsupported, at least until we can make it green again.

In the Python ecosystem, we'd use an xfail for it. I think we should emulate something similar here.

Reasoning

When looking at list of open pull request, anyone should be able to see the green-checkmark for changes that passed CI. Once this happens they can review them.

If a project has known-as-broken jobs, all pull-requests will report a red-check, making much harder to identify those that are ready for review or not. If the project happens to have more than 8-10 CI jobs, it can become even harder for the reviewer to identify if a genuinely required job failed or if only the known failures caused the mark.

As seen in screenshot example below none of the pending pull-requests have a green checkmark on them because that project has CI jobs known to be broken. That is making reviewing less likely to happen and reviewer time is limited and he is not expected to investigate each exiting PR to see if it was updated and if only broken jobs are causing the red mark on it.

This is not a realistic workflow for many, though. Most (all?) of the review tools distinguish the "ready for review" and "the CI is passing" statuses. They are different things. Many of us have seen Gerrit at least once in a lifetime: it encourages reviews of the code (and the commit message for that matter) and it shows those separately from the CI results. Some of the platforms have it as a WIP in titles.

In the case of GitHub, the flag for whether something should be reviewed or not is the Draft status that is literally controlled by a button saying Ready for review. One shouldn't rely on the CI status to start reviewing. Moreover, the CI may be broken and the contributor may need help to get it fixed — if you dismiss them, it is going to be a weird loop of not getting reviews because the CI is red and not getting the CI fixed because of no reviews and the maintainers don't even know about such a need to help someone.

Also, the CI may be temporarily broken because of minor linter issues in the PR and the author could benefit from getting asynchronous feedback while they are fixing the linting violations. Waiting/blocking on unrelated things is counterproductive. The CI checks exist for preventing broken merges, not for preventing reviews.

That said, we should encourage reviews of any PR that is not marked as a Draft. And reserve that the Draft label as a unified way of signaling "do not attempt to review this, it's not yet ready".

Other options

If we identify a way to still run broken jobs but avoid changing the final mark from green to red we should keep them. Sadly, AFAIK that is not possible with GHA at this time. Even if we skip jobs, the final mark will be red.

Skipping the jobs doesn't add a red mark AFAIK but continue-on-error does pass it through, yes. I've been meaning to experiment with the idea of making the failing jobs green if they are not required by the branch protection.

FWIW part of the tests pass in the windows tests and I don't see any reason not to run them in the CI. We should test that at least they work. It could be acceptable to somehow limit the test run to only those tests that are passing, for example.

Sorin Sbarnea · Answer 6 · Thu Sep 30 2021 21:55:32 GMT+0800 (China Standard Time)

@webknjaz First, that is not like xfail. Using xfail does not alter the outcome of pytest run. Pytest reports green at the end. If github jobs could do the same and avoid polluting the final result of the entire execution I would not have proposed that change.

The reality is that as far as we know, Windows builds never worked. We are not talking about a temporary regression, we are talking about the fact that these jobs were broken from the moment they were added.

I do not have the bandwidth to fix them myself and I did not see anyone working on that either in two weeks. I would not remove the jobs from the file but I would only comment them so we have at least two places reminding us that Windows is broken: that issue and the ci.yml file. In fact yesterday I even pinned that issue to increase its visibility.

Having permanent broken jobs does degrade my ability to review incoming patches, as I am no longer able to easily identify which incoming patches reported green. The direct result on this is less reviews.

I wonder if @tomaciazek knows how to fix the Windows failed tests.

Sviatoslav Sydorenko (Святослав Сидоренко) · Answer 7 · Thu Sep 30 2021 22:10:08 GMT+0800 (China Standard Time)

@webknjaz First, that is not like xfail. Using xfail does not alter the outcome of pytest run. Pytest reports green at the end. If github jobs could do the same and avoid polluting the final result of the entire execution I would not have proposed that change.

I think you misunderstood the context here. What I mean is that we could add some conditional skips to the failing tests under known bad envs rather than not testing at all.

The reality is that as far as we know, Windows builds never worked. We are not talking about a temporary regression, we are talking about the fact that these jobs were broken from the moment they were added.

And several tests are green. They should remain green.

I do not have the bandwidth to fix them myself and I did not see anyone working on that either in two weeks. I would not remove the jobs from the file but I would only comment them so we have at least two places reminding us that Windows is broken: that issue and the ci.yml file. In fact yesterday I even pinned that issue to increase its visibility.

See above. We can skip the failures to avoid those red flags but:

Having permanent broken jobs does degrade my ability to review incoming patches, as I am no longer able to easily identify which incoming patches reported green. The direct result on this is less reviews.

as I've mentioned in the previous post, it's the wrong approach. The PRs that are not marked as Draft should be reviewed regardless of the CI status. It shouldn't be an excuse to skip them.

Sorin Sbarnea · Answer 8 · Thu Sep 30 2021 22:11:58 GMT+0800 (China Standard Time)

@webknjaz Add the 9 required xfail conditions and I will be the first to approve the change. I doubt that mocha would not have a similar mechanisms as pytest xfail/xskip.

Re Draft status, based on my experience most people do not know how to use it. That combined to the fact that moving a non-draft to draft resets reviews makes it unreliable. I almost always ignore drafts, but reality is that most non drafts I encounter are not really ready for review (usually failing multiple checks). It would be awesome if github would prevent PRs from being created as non-draft until the pass CI.