Run microsoft-go-innerloop as a blocking job for the release flow

Question

Run microsoft-go-innerloop as a blocking job for the release flow

dagood opened this issue 4 months ago · comments

#1156 split the internal build into an official/signing part and an innerloop part. The release pipeline hasn't been adjusted, so now it simply waits for the official/signing part to complete. So, to go back to the status quo, we need to have the release pipeline trigger the internal innerloop pipeline and wait for it to finish, ideally in parallel with the official/signing pipeline.

The risk of leaving this out is fairly low:

The main risk would be missing a failed test and releasing a broken Go, but PR validation runs the same builders and configurations as the internal innerloop pipeline, and a release involves going through the PR process.
One remaining risk is concurrent(-ish) PRs, but the release branches are low-traffic and controlled by a small team.
- If this is a problem, we could mitigate by enabling the branch protection flag that requires 100%-up-to-date PR validation runs before a merge. (This is slow if we have a series of port PRs to merge, but more effective and easier to debug under pressure than letting tests fail and investigating after the fact.)
Another risk is subtle differences in the environment, but this is only theoretical and likely indicates an infra problem rather than a problem with Go.

Leaving it out also makes the release go faster, because we only need to wait for tests to run once (PR validation), not twice.

/cc @gdams @qmuntal

Quim Muntal · Answer 1 · Tue Mar 19 2024 23:15:20 GMT+0800 (China Standard Time)

The risk of leaving this out is fairly low
The main risk would be missing a failed test and releasing a broken Go

Even if the likelihood of a broken build is very low, the impact of releasing a broken Go is very high, so I would sleep more confortable if the innerloop pipeline is executed as part of the release process.

If this is a problem, we could mitigate by enabling the branch protection flag that requires 100%-up-to-date PR validation runs before a merge. (This is slow if we have a series of port PRs to merge, but more effective and easier to debug under pressure than letting tests fail and investigating after the fact.)

I would prefer not to require 100%-up-to-date PR runs before a merge givent hat the PR pipeline is very slow (~40min).

Davis Goodin · Answer 2 · Wed Mar 20 2024 02:28:52 GMT+0800 (China Standard Time)

I would prefer not to require 100%-up-to-date PR runs before a merge givent hat the PR pipeline is very slow (~40min).

Just in case this changes things: I'd only consider this for the release branches, never main. For the release branches, in my experience there's usually only one PR active at a time: backports happen over time, and release day activity is bundled together.

That said, getting back to the status quo (directly running innerloop as a dependency of release) rather than hashing out a new way to try to get the same coverage makes sense to me.

Also: it might also be useful to have a place to run internal-only tests if needed. And this is probably the best place to trigger scenario tests.

Davis Goodin · Answer 3 · Sat Mar 23 2024 05:44:13 GMT+0800 (China Standard Time)

Done, although the infra doesn't do the blocking, the release runner has to. Shows up like this: #1172.

Figuring out a way to do joins (perhaps re-evaluating the cost of maintaining our own state somewhere rather than relying on AzDO re-queue behavior and focused polling) is on my mind for when we're ready to make larger changes to release automation.