[CI] Add 'keep-going' label
alanwaketan opened this issue · comments
Upstream has this cool 'keep-going' label that allow CI to continue running after test failures such that we can catch all failures all by once.
Let's add that to our repo too.
@wonjoolee95 has volunteered to work on this as part of his functionalization poc. Let me know if you need help!
Putting down some initial investigations.
As per circleci/circleci-docs#3505, seems like accessing GitHub labels (like keep-going
) is not possible through CircleCI jobs but only possible through GitHub actions. PyTorch has migrated to GitHub actions (https://github.com/pytorch/pytorch/tree/master/.github a while back, but we are still using CircleCI jobs (https://github.com/pytorch/xla/tree/master/.circleci). So implementing such feature in our CI will require us to migrate to GitHub actions first.
However, what is possible (for a quick temporary solution) is just to update our run_tests.sh
script to not error out on failures as such:
Lines 8 to 12 in 425da77
CONTINUE_ON_ERROR
to true
and submit a PR. Then the CirlcleCI tests will keep going on failures. And then the developer should set this back to false
before merging their PR. On this bit, maybe we can introduce a lint job on master to make this flag is set to false
.