Use release candidates instead of peer reviews for approval of large changesets

Question

Use release candidates instead of peer reviews for approval of large changesets

MattiSG opened this issue 2 years ago · comments

Problem statement

Very large changesets to the codebase are necessary when implementing major overhauls, even when they have no functional impact. For example, when refactoring vocabulary or architecture, or when changing formatting.

Since 2016 and until now, mandatory peer reviews have been the way in which quality is managed. While this system has demonstrated its power in eliminating the vast majority of regressions and broken production builds as well as in sharing code knowledge, it also has a cost and tends to slow down changes.

In the specific case of very large changesets, reviews might be so expensive (> 4 hours review for a PR) that they wait for very long before being done: they are always postponed by core staff having to handle more urgent matters, and never taken by volunteers for whom it is too costly.
As time passes, the changeset becomes irrelevant and needs major investment in rebases, worsening the problem as previous reviews have to be dismissed.
This leads to major overhauls and necessary maintenance never being deployed, with PRs waiting for months if not years and creating a deadlock of obsolete dependencies and architecture.

State of the art

Different approaches have been tried and failed:

increasing investment from sponsors on specific topics (more PRs are opened but not more are merged);
fast dirty reviews (author usually does not dare to act upon them);
RFCs / specification (the implementation still needs to be reviewed).

Proposal

I instead suggest a novel approach for OpenFisca: deploying release candidates (RC), that is versions of the package that are in the next major version, but not automatically installed, and asking for country packages to test these RCs.

A pull request is opened for a changeset of at least L SLoC.
At least P people (author and P - 1 reviewers or P reviewers) comment that:
- The changeset is too large for manual review.
- Ask for activation of the pre-release process.
- Commit to trying the pre-release on a specific country package.
The version number is marked as pre-release according to SemVer§9, with suffix -rc.X, where X is the iteration number.
All automated tests pass.
The pre-release is deployed manually.
Reviewers approve or request changes to the changeset based on the result of their testing of the pre-release on the country package they committed to testing with.
Once at least Q people out of the P have approved the PR and none have requested changes, it is considered valid.
The version number is changed to remove the pre-release marking.
The pull request is merged and automatically deployed.

Process

I open this issue to foster a discussion and generate consensus. I expect participants to either:

State support of the proposal and specify their expected values for L, P, and Q.
Suggest changes to the proposal.
Explain why this proposal is not sound and offer an alternative way to solve the stated problem.
Explain why the problem is irrelevant.

If consensus is reached, we will open a pull request to the Core CONTRIBUTING file with this new process which will hopefully enable unblocking many major PRs that are long overdue 🙂 Thank you for your participation!

Mahdi Ben Jelloul · Answer 1 · Tue Nov 15 2022 23:08:31 GMT+0800 (China Standard Time)

Sounds good to me. I don't have a suggestion for L and leave to experts, but I would have P= 4 to 5 (distinct country packages) and Q = 3 to 4.

Benoit Courty · Answer 2 · Wed Nov 16 2022 01:47:50 GMT+0800 (China Standard Time)

Hello,
Nice proposition.

I think above 1 000 lines of code it is difficult, but it depends if it is limited to few new files or spread across 500 existing files.

The pre-release is deployed manually.
You mean a manual CI task to deploy to PyPi ? Or doing it manually locally ?

Matti Schneider · Answer 3 · Wed Nov 16 2022 20:26:44 GMT+0800 (China Standard Time)

it depends if it is limited to few new files or spread across 500 existing files

Interesting point, thanks! In the proposed phrasing, I use the SLOC condition as a prerequisite, not as a systematic trigger. The idea is to prevent using release candidates systematically as a way to circumvent peer reviews, which yield other benefits. So if the SLOC size condition is met but the “number of people requesting the release candidate option” is not, a situation which is likely to happen in the case where it's all in new files, then we stick to the default “peer review” process 🙂

You mean a manual CI task to deploy to PyPi ? Or doing it manually locally ?

Whichever works. This proposal is agnostic to how the release candidate deployment is made, as long as the trigger is manual and voluntary.

Matti Schneider · Answer 4 · Wed Nov 16 2022 20:32:24 GMT+0800 (China Standard Time)

it depends if it is limited to few new files or spread across 500 existing files

Interesting point, thanks! In the proposed phrasing, I use the SLOC condition as a prerequisite, not as a systematic trigger. My idea is to prevent using release candidates systematically as a way to circumvent peer reviews, which yield other benefits. So if the SLOC size condition is met but the “number of people requesting the release candidate option” is not, a situation which is likely to happen in the case where it's all in new files, then we stick to the default “peer review” process 🙂

You mean a manual CI task to deploy to PyPi ? Or doing it manually locally ?

Whichever works. This proposal is agnostic to how the release candidate deployment is made, as long as the trigger is manual and voluntary.

P = 4 to 5

That sounds a bit ambitious to me… If we try to name them, who would that be? France, Tunisia, Aotearoa, Australia?

Mahdi Ben Jelloul · Answer 5 · Wed Nov 16 2022 21:00:46 GMT+0800 (China Standard Time)

I can go with P = 2 but then Q = 2 ... so we should go with P = 3, Q = 2 right ?

Can't we automatize the process with a list of packages ? When there is such a PR, automatically create a PR with a dependency on the openfisca-core RC candidate and see if the tests pass.

Matti Schneider · Answer 6 · Wed Nov 16 2022 21:23:00 GMT+0800 (China Standard Time)

Can't we automatize the process with a list of packages?

This will be quite a bit of work (see #1158 for what it entails simply for templates already). Given the current very limited resources for Core, it is more actionable that we agree on a process that can then be automated if it demonstrates its value rather than investing up front in automating something that has not been proven to be useful. Thank you for your understanding.

Mauko Quiroga-Alvarado · Answer 7 · Thu Nov 17 2022 02:32:24 GMT+0800 (China Standard Time)

Hi! I like it, yet I have the same question about L. I have a concrete example #1146.

It is beyond 2k lines, so it'd qualify.
That's a feature I wouldn't merge even if it was reviewed, before being tested.

Proposal

1.A. A pull request is open and has not received a review that falls under any of the following categories:

It's a bug affecting at least the author
It has been without review for +4w
It's a breaking change
It's over 1k SLoC

2.A P = 3

7.A Q = 2/3 of the original P ; and Q = any P . So if person A, B, and C, are P, then Q needs at least A, B, and D.

Matti Schneider · Answer 8 · Thu Nov 17 2022 23:33:50 GMT+0800 (China Standard Time)

That's a feature I wouldn't merge even if it was reviewed, before being tested.

The example provided is a draft. But, assuming it would be the final version, I still understand that testing might be necessary.
This proposal does not aim at preventing country package-based testing in other cases. It aims at offering a clearly defined process in which no full peer review is required because it is impossible.

It is beyond 2k lines, so it'd qualify.

I am sorry if I was not clear: the SLOC condition is mandatory but not sufficient. This pull request would not be eligible to automatic merge without a review. It would merely allow reviewers to state that they cannot review the whole changeset and commit to trying the feature in some country package they are familiar with. If reviewers (including you) believe that the feature should not be merged even after a test, there is no reason for it to be just on grounds that it was tested with full scale country packages.

A pull request is open and has not received a review that falls under any of the following categories:

It's a bug affecting at least the author

This seems to me like free-for-all merging, basically allowing systematically bypassing reviews 🤔

Mauko Quiroga-Alvarado · Answer 9 · Tue Dec 06 2022 10:03:22 GMT+0800 (China Standard Time)

Oh sorry, I misunderstood the purpose, I was effectively just outlining conditions for pre-releasing.

Assuming it would be the final version, I still understand that testing might be necessary.

I believe so, yet I get now you're trying to solve another problem first.

This pull request would not be eligible to automatic merge without a review. It would merely allow reviewers to state that they cannot review the whole changeset and commit to trying the feature in some country package they are familiar with.

Sure that's what I meant.

This seems to me like free-for-all merging, basically allowing systematically bypassing reviews 🤔

That is a problem we do not have today, yet I understand you may be afraid given the past history of OpenFisca.

My reasoning is that, from a resiliency perspective, the last person and only person who'll continue to keep an eye on an open bug that goes without solving, in a worst case scenario, is the person whose value-adding capacity is being handicapped by the non-resolution of the bug.

Of course there's a trade-off between two hypothetical cases where on the one hand, nobody does anything, and another one where people govern by decree (free-for-all merging).

Here's my updated and simplified proposal then:

Proposal

1.B. A pull request is open and has not received a review that falls under any of the following categories:

It has been without review for +4w
Its last review was +4w ago
It's over 1k SLoC

2.B. P = 3

7.B Q = 2/3 of the original P.

Matti Schneider · Answer 10 · Tue Dec 13 2022 17:40:15 GMT+0800 (China Standard Time)

Sounds good, I like the idea of bypassing the SLoC condition and consider lack of review as a sign of specific complexity that is not captured by diff size itself 👍 Since we still need P people to commit to testing, this would still prevent abuse by pushing changes that are not reviewed by lack of interest.

The updated proposal thus looks like:

Proposal v2

I instead suggest a novel approach for OpenFisca: deploying release candidates (RC), that is versions of the package that are in the next major version, but not automatically installed, and asking for country packages to test these RCs.

A pull request is open for a changeset, it is not approved, and it matches any of following conditions:
A. its changeset is of at least 1000 SLoC;
B. it has not been reviewed in the last 4 weeks.
At least 3 people (author and 2 reviewers or 3 reviewers) comment that they:
- consider the changeset too large or complex for manual review;
- ask for activation of the pre-release process by mentioning #1159;
- commit to trying the pre-release on at least one specific country package.
The version number is marked as pre-release according to SemVer§9, with suffix -rc.X, where X is the iteration number.
All automated tests pass.
The pre-release is deployed manually.
Reviewers approve or request changes to the changeset based on the result of their testing of the pre-release on the country package they committed to testing with.
Once at least 2 people out of the 3 have approved the PR and none have requested changes, it is considered valid.
The version number is changed to remove the pre-release marking.
The pull request is merged and automatically deployed.

Matti Schneider · Answer 11 · Tue Dec 13 2022 17:41:54 GMT+0800 (China Standard Time)

I suggest we try out this proposal in the wild and see how it fares 🙂