tensorflow / build

Build-related tools for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Internal build vs OSS misaligment

bhack opened this issue · comments

commented

We had some c++ flags disaligment between c++ and copybara. See:
tensorflow/tensorflow#56276 (comment)

We had also many problem about failure reproducibility between OSS and internal test without forcing OSS with
this to reproduce internal failures: tensorflow/tensorflow@11dc383

Also we had multiple rollbacks after merge.

/cc @cheshire any other note?

/cc @learning-to-play

commented

P.s. NIT we Need to enforce:
tensorflow/tensorflow#56276 (comment)

P.s.s. the linking required many CPU cores at every iteration cause we had >50 targets to link (very resource intensive) for just running a single test after an edit. So we needed to do the linking in parallel with many core and with a speedup of LD Gold (#110).

Also I needed to keep the PR branch freezed without rebasing/merging for months cause a rebase will eventually invalidate the Bazel cache requiring a monster build time again.
This is another problem for PR that are open for weeks or months and you could need to rebase to solve conflicts.

commented

We had also a quite tricky issue filtering single test on the development cycle.

You was already notifed at:
tensorflow/tensorflow@0f9af91

commented

@mihaimaruseac I don't know if you could add something here for tensorflow/tensorflow#57468 (comment)

Thanks

I believe tensorflow/tensorflow@96b26a2 is a huge step in fixing this, fixing most (if not all) such issues.

commented

@cheshire Thanks, do you have a full list of the currently enabled warnings as error in copybara related jobs?

commented

I believe tensorflow/tensorflow@96b26a2 is a huge step in fixing this, fixing most (if not all) such issues.

We are still suppressing all the warnings:
https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L295-L301

# Suppress all C++ compiler warnings, otherwise build logs become 10s of MBs.
build:android --copt=-w
build:ios --copt=-w
build:linux --host_copt=-w
build:macos --copt=-w
build:windows --copt=/W0
build:windows --host_copt=/W0

With the mentioned commit we are just enabling the specific unused-result but if we check this table that flag is one of the default error we have also with -Werror.

So the point here is to understand what flags we have in the copybara builds as if we are using -Werror there it has > 50 warning types.

@mihaimaruseac I don't know if you could add something here for tensorflow/tensorflow#57468 (comment)

Sadly, I don't think we can do much here. The issue there was that an internal file with internal code also needed to be updated. Using copybara instead of the internal file would have been too cumbersome.

Though, I also think that this type of breakages is small. It should only occur when you are adding new defines.