tensorflow / build

Build-related tools for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add arm64 third-party CI

bzhaoopenstack opened this issue · comments

commented

System information

TensorFlow version (you are using): 2.0+ and master branch
Are you willing to contribute it (Yes/No): Yes
Describe the feature and the current behavior/state.
Currently, Tensorflow only has the official build CI on X86 and third-party build CIs on x86 and ppc64. There is no CI for arm64. Adding a arm64 CI can help community to discover arm64 problems easily.

OpenLab supports public CI system for opensource projects[1]. Now it supports arm64 arch and the tensorflow nightly build jobs for 2.0+ and master version have been added there as well[2]. It runs tensorflow build everyday at UTC-18.

Just like what tensorflow do currently, we can just easily add a new badge in the README.md file to link the Openlab arm64 third-party CI.

As you can see in the page[2], tensorflow 2.0, 2.1 and 2.2(master) build well on aarch64. But in some aws libs strongly based on x86 ARCH, so in master branch, I skip that part for build. You can see the build brief in [3], download the build whl packages there, and see the details logs in [4].

So adding the arm64 build CI is useful for the community.

1: https://openlabtesting.org
2: http://status.openlabtesting.org/builds?project=tensorflow%2Ftensorflow
3: http://status.openlabtesting.org/build/c816e5c9d6cc4519b933414fc6044d28
4: https://logs.openlabtesting.org/logs/periodic-18/github.com/tensorflow/tensorflow/master/tensorflow-arm64-build-daily-v2.1.0/c816e5c/

Additional context
Now the test is CPU only and is basing on Ubuntu 18.04 and python3.6. More can be added in the future.

And I'm from OpenLab commuinty. I'll keep looking after the tensorflow arm64 CI and try my best to fix the arm64 failure then.

Here is an example[1] we done in pytorch community. See 'Linux (aarch64) CPU' badge in the README.md.
Also for another community[2], we done in greenplum-db community. See 'Zuul Regression Test On Arm' badge in the README.md

1: https://github.com/pytorch/pytorch/blob/master/README.md
2: https://github.com/greenplum-db/gpdb

Will this change the current api? How?
No
Who will benefit with this feature?
The arm64 users and developers

  • Notes

And there is some other discussion about the same problem in tensorflow repo. see tensorflow/tensorflow#40463
https://groups.google.com/a/tensorflow.org/forum/#!topic/build/zTbmc0T6jAw

And for now, @AshokBhat is working on fix the aws-lab libs for ARM support. We already build the master branch after tensorflow/tensorflow#40700 in my local repo, and downgrade numpy via pip3 install numpy==1.18.0 with cmd:
bazel clean --expunge ; bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package --local_ram_resources=10240 --local_cpu_resources=7 --verbose_failures

All thanks you guys and @AshokBhat 's kind help.

@bzhaoopenstack For your reference PowerPC tracking issue is at tensorflow/community#7. It will give you an idea of what other information was provided for PowerPC CI builds.

@angerson @wdirons @gunan
A couple of questions
(1) Where do you need this Arm CI tracking issue to be raised? I had assumed tensorflow/build repo was the best place before I found PowerPC tracking issue in tensorflow/community.
(2) What additional information do you need from us at Arm and at OpenLab, for this ticket to proceed?

Wow, that's great! Good to see, and thanks for following up with us. For anyone curious, the build script is here, I think: https://github.com/theopenlab/openlab-zuul-jobs/blob/master/playbooks/tensorflow-arm-build/run.yaml

July's SIG Build meeting (July 7th, see http://bit.ly/tf-sig-build-notes) is a good place to discuss what's next, especially since I think three parties were discussing ARM (Arm, Linaro, OpenLab). I think you probably have enough or almost enough here to add a badge to the TensorFlow README, and from there we can also talk about what to do for the Build repo.

Ping @gunan @perfinion.

commented

@AshokBhat Thanks for your kind suggest. So appreciated.

@angerson Thanks very much. I will try my best to join the meeting.

In terms of community support, the following will be the primary contacts from Arm and Linaro for TensorFlow CPU build/test failures
Arm - @cfRod
Linaro - @scornp

@AshkoBhat for the questions:

  1. Tensorflow/build repository did not exist when we were setting up powerpc builds. So, this repository should now have the issues.
  2. https://github.com/tensorflow/community/blob/master/sigs/build/community-builds.md documents the idea at a high level. You set a build up that continuously tests TF at head. Then we work together and add it to the TF readme for now: https://github.com/tensorflow/tensorflow/blob/master/README.md and soon hopefully a subpage in the website, but that is TBD. We ill also update our triage team to redirect ARM issues people see to the people you provided.

@gunan Thanks for the followup.
@bzhaoopenstack , We did bring up the OpenLab CI work in the monthly TF SIG Build meeting (with @angerson and other members of the team). From what I understand, the ask is for your team to raise a PR for TensorFlow readme to link to OpenLAB CI.

commented

@AshokBhat @angerson Hi, thanks very much for your kind help.
After read the meeting log, yeah, I'm so appreciated that you guys raise this in the meeting. That's really helpful for us.

Yeah, we post a PR for the Openlab ARM build badge, see tensorflow/tensorflow#41223
And another thing we are very concerned is that we have a strong desire to cooperate with all you guys who want to make the ARM CI for tensorflow. Also we can provide the test resources, and etc. And I think we can get some requests about the basic aarch64 CI from community. As the CI is built for tensorflow community. Hmm, I hope we can discuss more about the approach during your previous discuss. Thanks.

commented

How we will handle custom-ops infra? See tensorflow/addons#1982

Is there a fixed url where aarch64 wheels can be downloaded? Without going through list of zuul builds.

I know how to find it. I am asking more about URL I can give to wget and have it always fetch latest build.

commented

OpenLab is planning to deploy a fixed URL for users' download. Sorry for that. As we are thinking about how to make it in a appriorate way. So it would be online soon, I think.

Thanks. In meantime I will look at building it on my own then (have to build numpy due to numpy/numpy#16677 anyway).

Where I can find your build script?