iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create a TSan continuous integration bot

bjacob opened this issue · comments

We're ready for this! TSan is all green at the moment, and the steps to enable it are explained in updated sanitizers docs (#8968).

I'm down to try to do it this week if I can get some pointers - what are the out-of-git-tree steps required here to create the bot? @GMNGeoffrey . I'll also try to finally land the docs (#8968).

There are no longer any out-of-git-tree steps! Well that's actually not quite true because we're waiting on some fixes to the Buildkite API to allow us to create pipelines with specific permissions, so you have to do some tweaking after submission, but that bit isn't technically load bearing. Write a script that does what you want and then write a yaml config like https://github.com/google/iree/blob/main/build_tools/buildkite/pipelines/untrusted/build-runtime-cmake.yml and trigger it in postsubmit.yml and presubmit.yml. e6fccb6 is an example, but you can ignore the buildkite cfg file there: you don't need to add it again. More detail in the README at https://github.com/google/iree/tree/main/build_tools/buildkite

Thanks for the help! 🧑‍🔬 🐶 i have no idea what i'm doing but this is my best attempt at a first shot at trying to follow the steps you outline, how does this look? #9366

Good news: #9371 appears to be working and essentially ready.
Bad news: we are not TSan-green anymore, this has regressed since this issue was filed. Going to file issues for the data races that we have at the moment, we'll have to fix those first.

OK, there seem to be only 2 issues:

  • #9392 may be the only general issue
  • #9393 seems to be vulkan-related and might be specific to my system's particular vulkan driver. Earlier I was not testing Vulkan at all, so that issue may have always existed. For now we can unblock this TSan CI bot by making it disable the Vulkan backend.

Update: #9371 is working and up for review! (#9392 was fixed by Ben yesterday and #9393 was identified as a GPU driver issue addressed by not actually running GPU tests).