.github/workflows/ci.yml: improvements

Question

.github/workflows/ci.yml: improvements

tarasmadan opened this issue 2 months ago · comments

Taras Madan commented 2 months ago

Is your feature request related to a problem? Please describe.
Arc v2 based ci runners configuration may be better.

Describe the solution you'd like

GCP autoscaling doesn't work. Let's monitor actions/runner-container-hooks#140 progress.
The runners scale down to 0. Setting minRunners to 4-6-8 we can save 30 seconds for every job.
Force-push creates new testing request. We don't cancel already started requests for that PR. It makes sense to cancel them. GH documentation knows how to do it.
[Done, #4537 ] We don't limit the runners count. Having more runners than instances may be the source of timeout-error. It makes sense to fix maxRunners.
The caching doesn't work. Effective cache size is 0.
[Done, it seems to be a spot VMs usage side effect] Some jobs finish with 130 error.
[Done, #4558] CodeCov requires additional configuration. It doesn't properly detect "Github Actions" environment.

Taras Madan · Answer 1 · Tue Mar 05 2024 02:27:46 GMT+0800 (China Standard Time)

#4537 surprisingly limited amount of runners/machine to 1.

Taras Madan · Answer 2 · Tue Mar 05 2024 02:31:18 GMT+0800 (China Standard Time)

Removed spot machines pool. Added normal machines.
As a result, I don't see the jobs to fail with 130 error.
With high probability 130 error was a preemption side affect.
To use spot VMs, ARC has to restart the job in case of error.
We can modify ci.yml to restart jobs, but I don't want to pollute the ci.yml file.

Taras Madan · Answer 3 · Wed Mar 06 2024 22:27:44 GMT+0800 (China Standard Time)

One more problem - codecov detects "Local" environment instead of "Github Actions".
Normal log "['info'] Detected GitHub Actions as the CI provider."
Current log "['info'] Detected Local as the CI provider."

Taras Madan · Answer 4 · Thu Mar 07 2024 03:08:36 GMT+0800 (China Standard Time)

#4558 to workaround codecov problem.