tensorflow for Nvidia TX1

Question

tensorflow for Nvidia TX1

jmtatsch opened this issue 8 years ago · comments

Hello,

@maxcuda has recently got tensorflow running on the tk1 as documented in blogpost http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html but since then been unable to repeatedly build it. I am now trying to get tensorflow running on a tx1 tegra platform and need some support.

Much trouble seems to come from Eigen variadic templates and using C++11 initializer lists, both of wich could work according to http://devblogs.nvidia.com/parallelforall/cplusplus-11-in-cuda-variadic-templates/.
In theory std=c++11 should be set according to crosstool. Nevertheless, nvcc crashes happily on all of them. This smells as if the "-std=c++11" flag is not properly set.
How can I verify/enforce this?

Also in tensorflow.bzl, variadic templates in Eigen are said to be disabled
We have to disable variadic templates in Eigen for NVCC even though std=c++11 are enabled
is that still necessary?

Here is my build workflow:

git clone —recurse-submodules git@github.com:jmtatsch/tensorflow.git
cd tensorflow
grep -Rl "lib64"| xargs sed -i 's/lib64/lib/g' # no lib64 for tx1 yet 
./configure
bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer

bhack · Answer 1 · Sat Jan 23 2016 19:26:40 GMT+0800 (China Standard Time)

See http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/

bhack · Answer 2 · Sat Jan 23 2016 19:48:48 GMT+0800 (China Standard Time)

Are you using jetpack 2?

jmtatsch · Answer 3 · Sat Jan 23 2016 19:52:54 GMT+0800 (China Standard Time)

No, JetPack does not support running directly on the L4T platform.

bhack · Answer 4 · Sat Jan 23 2016 19:54:48 GMT+0800 (China Standard Time)

I meant if you have flashed the board with jetpack 2 to have cuda 7 support.

jmtatsch · Answer 5 · Sat Jan 23 2016 20:06:11 GMT+0800 (China Standard Time)

Ah, yes I have Cuda 7 support and used jetpack 2. To be more precise, the target is not actually the Jetson TX1 but an repurposed Nvida Sield TV flashed to L4T 23.1 for Jetson.

Vincent Vanhoucke · Answer 6 · Sat Jan 23 2016 23:47:41 GMT+0800 (China Standard Time)

@Yangqing FYI

Benoit Steiner · Answer 7 · Thu Feb 04 2016 10:33:38 GMT+0800 (China Standard Time)

I think there is a TX1 that I could use to take a look. I'll see what I can do.

Rob Agar · Answer 8 · Sun Feb 07 2016 01:31:26 GMT+0800 (China Standard Time)

In theory, can TensorFlow run usefully on the TK1? Or is the 2G memory too small for, say, face verification?

Benoit Steiner · Answer 9 · Tue Feb 09 2016 04:46:54 GMT+0800 (China Standard Time)

@robagar It all depends on how large your network is and whether you intend to train the model on TK1 or just run inference. Two GB of memory is plenty to run inference on almost any model.

Benoit Steiner · Answer 10 · Thu Feb 11 2016 01:41:21 GMT+0800 (China Standard Time)

I have worked around an issue that prevented nvcc from compiling the Eigen codebase on Tegra X1 (https://bitbucket.org/eigen/eigen/commits/d0950ac79c0404047379eb5a927a176dbb9d12a5).
However, so far I haven't succeeded in setting up bazel on the Tegra X1, so I haven't been able to start working on the other issues reported in http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html

jmtatsch · Answer 11 · Thu Feb 11 2016 19:57:04 GMT+0800 (China Standard Time)

That's good news ;) Whats the problem with bazel? maxcuda's instructions for building bazel worked quite well for me..

jmtatsch · Answer 12 · Sat Feb 13 2016 17:02:59 GMT+0800 (China Standard Time)

For building bazel I had to use a special java build which can cope with the 32bit rootfs on a 64bit machine

wget http://www.java.net/download/jdk8u76/archive/b02/binaries/jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz
sudo tar -zxvf jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz -C /usr/lib/jvm
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.8.0_76/bin/java" 1
sudo update-alternatives --config java

There seems to be one eigen issue I can't get around:

bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
INFO: From Compiling tensorflow/core/kernels/cross_op_gpu.cu.cc:
At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

./tensorflow/core/lib/strings/strcat.h(195): internal error: assertion failed at: "/dvs/p4/build/sw/rel/gpu_drv/r346/r346_00/drivers/compiler/edg/EDG_4.9/src/decl_inits.c", line 3251


1 catastrophic error detected in the compilation of "/tmp/tmpxft_0000682d_00000000-8_cross_op_gpu.cu.cpp4.ii".
Compilation aborted.
Aborted
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: output 'tensorflow/core/_objs/gpu_kernels/tensorflow/core/kernels/cross_op_gpu.cu.o' was not created.
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: not all outputs were created.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 2271.358s, Critical Path: 2260.25s

Can you have a look at TensorEvaluator.h please?

Benoit Steiner · Answer 13 · Tue Feb 23 2016 11:29:42 GMT+0800 (China Standard Time)

I still haven't been able to install bazel. That said, the assertion you're facing seems to be triggered by the variadic template at line 195 of ./tensorflow/core/lib/strings/strcat.h. I would just comment this code and see how it goes.

Gabriel Garrett · Answer 14 · Wed Feb 24 2016 09:06:18 GMT+0800 (China Standard Time)

When you say maxcuda has "been unable to repeatedly build it" since then, does that mean that tensorflow is no longer working on the TK1 again? Because I just ordered the TK1 with the express purpose of being able to run tensorflow :-/

maxcuda · Answer 15 · Wed Feb 24 2016 09:11:36 GMT+0800 (China Standard Time)

Yes, I have been unable to recompile the latest versions. The wheel I built around Thanksgiving should still work but it is quite an old version.

jmtatsch · Answer 16 · Sat Feb 27 2016 13:54:55 GMT+0800 (China Standard Time)

Commenting the variadic template at line 195 helps a little but at line 234 there is a another template that seems to be required. Any hints how to rewrite that in nvcc friendly manner?

jmtatsch · Answer 17 · Fri Mar 11 2016 14:03:44 GMT+0800 (China Standard Time)

@benoitsteiner
any suggestions how this could be rewritten in a nvcc compatible manner?

// Support 5 or more arguments
template <typename... AV>
inline void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
                      const AlphaNum &c, const AlphaNum &d, const AlphaNum &e,
                      const AV &... args) {
  internal::AppendPieces(dest,
                         {a.Piece(), b.Piece(), c.Piece(), d.Piece(), e.Piece(),
                          static_cast<const AlphaNum &>(args).Piece()...});
}

Martin Wicke · Answer 18 · Thu Mar 17 2016 00:52:43 GMT+0800 (China Standard Time)

@damienmg FYI

Jason Lee · Answer 19 · Sat Mar 19 2016 04:21:15 GMT+0800 (China Standard Time)

Hi folks, I'm also working on building everything from scratch on tx1. There is lots of discussions here and also on nvidia developer forums. But by now I haven't seen any well summarized instruction besides that tk1's. Can we start another repo or script file so people can work on it more efficient?

jmtatsch · Answer 20 · Sun Mar 20 2016 18:51:34 GMT+0800 (China Standard Time)

Imho we have to first solve the fundamental issue of the variadic templates not working with nvcc. Either the developers would have to do without those templates which is backwards and probably not going to happen or nvidia has to step up and make nvcc more compatible? In theory nvcc should already be able to deal with your own variadic templates, but external e.g. STL headers won't "just work" because of the need to annotate all functions called on the device with "host device". Maybe someone knows a good way how to get around this issue....

Benoit Steiner · Answer 21 · Tue Mar 22 2016 03:48:31 GMT+0800 (China Standard Time)

@jmtatsch At the moment, the version of cuda that is shipped with the tegra x1 has problems with variadic templates. Nvidia is aware of this and working on a fix. I updated Eigen a few weeks ago to disable the use of variadic templates when compiling on tegra x1, and that seems to fix the bulk of the problem. However, StrCat and StrAppend still rely on variadic templates. Until nvidia releases a fix, the best solution is to comment out the variadic versions of StrCat and StrAppend, and create non variadic versions of StrCat and StrAppend with up to 11 arguments (since that's what TensorFlow currently needs).
There are a couple of ways to avoid the STL issues: a brittle solution is to only compile optimized kernels. The compiler then inlines the STL code at which point the lack of host device annotation doesn't matter since there is no function call to resolve. A better solution is to replace all the STL functionality with custom code. We've started to do this in Eigen by reimplementing most of the STL functions we need in the Eigen::numext namespace. This is tedious by much more reliable than relying on inlining to bypass the problem.

maxcuda · Answer 22 · Tue May 17 2016 09:14:43 GMT+0800 (China Standard Time)

I have a build of TF 0.8 but it requires a new 7.0 compiler that is not yet available to the general public.
I am building a wheel on a Jetson TK1, I will make it available after some testing.
I will update the instructions on how to build from source on cudamusing.

Rob Agar · Answer 23 · Tue May 17 2016 15:42:47 GMT+0800 (China Standard Time)

Good work @maxcuda! Will it build on the TX1 too?

maxcuda · Answer 24 · Tue May 17 2016 22:06:30 GMT+0800 (China Standard Time)

Yes, it will build on TX1 too. I fixed a problem with the new memory allocator to take in account the 32bit OS. Some basic tests are passing but the label_image test is giving the wrong results so there may be some other places with 32bit issues.

maxcuda · Answer 25 · Wed May 18 2016 00:44:28 GMT+0800 (China Standard Time)

@benoitsteiner , with the new compiler your change to Eigen is not required anymore ( and it is forcing to edit a bunch of files). Could you please remove the check and re-enable variadic templates ?

Benoit Steiner · Answer 26 · Wed May 18 2016 06:48:09 GMT+0800 (China Standard Time)

@maxcuda Where can I download the new cuda compiler? I'd like to make sure that I don't introduce new problems when I enable variadic templates again.

Tyler Fox · Answer 27 · Fri Jun 17 2016 21:31:23 GMT+0800 (China Standard Time)

@maxcuda is the new 7.0 compiler you were referencing part of Jetpack 2.2 that was just released?

maxcuda · Answer 28 · Fri Jun 17 2016 23:20:06 GMT+0800 (China Standard Time)

Yes, you can get it with:
wget http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb

The good news are that I was able to build v0.8 but some of the results are incorrect. I will update the blog with the changes. With v0.9 I had problem with the cudnn.cc file, it looks like it cannot handle cuddn v2.

Tyler Fox · Answer 29 · Fri Jun 17 2016 23:36:07 GMT+0800 (China Standard Time)

Thanks so much. Looking forward to your post so I can get tensorflow running on the TX1

maxcuda · Answer 30 · Sat Jun 18 2016 03:01:10 GMT+0800 (China Standard Time)

I updated my building instruction on cudamusing and also posted a wheel file.

Syed Tousif Ahmed · Answer 31 · Wed Jul 06 2016 09:58:21 GMT+0800 (China Standard Time)

Has anyone tested this on jetson tx1? I can't seem to get bazel build on aarch64.

Jaeoh Lee · Answer 32 · Wed Jul 06 2016 15:29:53 GMT+0800 (China Standard Time)

@syed-ahmed I tested it on TX1. This is my configurations.

Cuda Toolkit 7.0, JetPack 2.2(32bit)
Bazel 0.2.1
jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz
./configure : compute capability 5.3
bazel option : --local_resources 2048,2.0,1.0

Tyler Fox · Answer 33 · Wed Jul 06 2016 23:37:58 GMT+0800 (China Standard Time)

@syed-ahmed I got it to build on an aarch64 TX1. I mostly followed the instructions for the TK1 at cudamusing.blogspot.de. The only additional things I did were

add aarch64 to the ARM enum in /bazel/src/main/java/com/google/devtools/build/lib/util/CPU.java by changing line 28 to "ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64"))," without quotes
Added aarch64 as valid ARM machine type in /bazel/scripts/bootstrap/buildenv.sh by changing line 35 to "if [ "${MACHINE_TYPE}" = 'arm' -o "${MACHINE_TYPE}" = 'armv7l' -o "${MACHINE_TYPE}" = 'aarch64' ]; then" without quotes

Or, if you prefer, here is the bazel executable for aarch64 I ended up with: https://drive.google.com/file/d/0B8Gc_oVaYC7CWEhOMHJhc0hLY0U/view?usp=sharing

Martin Wicke · Answer 34 · Wed Jul 06 2016 23:40:35 GMT+0800 (China Standard Time)

Maybe make a PR against bazel?

On Wed, Jul 6, 2016 at 8:38 AM Tyler Fox notifications@github.com wrote:

@syed-ahmed https://github.com/syed-ahmed I got it to build on an
aarch64 TX1. I mostly followed the instructions for the TK1 at
cudamusing.blogspot.de. The only additional things I did was

add aarch64 to the ARM enum in
/bazel/src/main/java/com/google/devtools/build/lib/util/CPU.java by
changing line 28 to "ARM("arm", ImmutableSet.of("arm", "armv7l",
"aarch64"))," without quotes

Added aarch64 as valid ARM machine type in
/bazel/scripts/bootstrap/buildenv.sh by changing line 35 to "if [
"${MACHINE_TYPE}" = 'arm' -o "${MACHINE_TYPE}" = 'armv7l' -o
"${MACHINE_TYPE}" = 'aarch64' ]; then" without quotes

Or, if you prefer, here is the bazel executable for aarch64 I ended up
with:
https://drive.google.com/file/d/0B8Gc_oVaYC7CWEhOMHJhc0hLY0U/view?usp=sharing

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#851 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAjO_SFJWCHTe1vT-jcv8t5tp51x9clmks5qS8vjgaJpZM4HK5_C
.

Syed Tousif Ahmed · Answer 35 · Wed Jul 06 2016 23:55:26 GMT+0800 (China Standard Time)

@tylerfox Thank you! I'll try your suggestions. In the meanwhile, any thoughts on this: bazelbuild/bazel#1264 and @wtfuzz 's change for cc_configure.bzl. I was getting a toolchain error. So wondering if you encountered it.

Did you also build with latest bazel release or 0.1.4.? And how about the tensorflow version - r0.8?

Tyler Fox · Answer 36 · Thu Jul 07 2016 00:02:22 GMT+0800 (China Standard Time)

@syed-ahmed yes, changing the buildenv.sh should fix that issue. Also it's worth noting that I used bazel 0.1.4 per the instructions on cudamusing. I should probably also test on the current version of bazel, but for now I know 0.1.4 works

Syed Tousif Ahmed · Answer 37 · Thu Jul 07 2016 04:15:42 GMT+0800 (China Standard Time)

I am trying to build the tensorflow r0.9 release. I got bazel 0.2.1 installed following @tylerfox 's suggestions. Getting this following error when trying to build tensorflow. Any thoughts? Appreciate all the help.

>>>>> # @farmhash_archive//:configure [action 'Executing genrule @farmhash_archive//:configure [for host]']
(cd /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow && \
  exec env - \
    PATH=/usr/local/cuda-7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ubuntu/bazel/output/ \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; pushd external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; workdir=$(mktemp -d -t tmp.XXXXXXXXXX); cp -a * $workdir; pushd $workdir; ./configure; popd; popd; cp $workdir/config.h bazel-out/host/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; rm -rf $workdir;')
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/farmhash_archive/BUILD:5:1: Executing genrule @farmhash_archive//:configure failed: bash failed: error executing command 
  (cd /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow && \
  exec env - \
    PATH=/usr/local/cuda-7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ubuntu/bazel/output/ \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; pushd external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; workdir=$(mktemp -d -t tmp.XXXXXXXXXX); cp -a * $workdir; pushd $workdir; ./configure; popd; popd; cp $workdir/config.h bazel-out/host/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; rm -rf $workdir;'): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
/home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow
/tmp/tmp.ZKGtjQ4mLO /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... /tmp/tmp.ZKGtjQ4mLO/missing: Unknown `--is-lightweight' option
Try `/tmp/tmp.ZKGtjQ4mLO/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
./config.guess: unable to guess system type

This script, last modified 2010-08-21, has failed to recognize
the operating system you are using. It is advised that you
download the most up to date version of the config scripts from

  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
and
  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD

If the version you run (./config.guess) is already up to date, please
send the following data and any information you think might be
pertinent to <config-patches@gnu.org> in order to provide the needed
information to handle your system.

config.guess timestamp = 2010-08-21

uname -m = aarch64
uname -r = 3.10.96-tegra
uname -s = Linux
uname -v = #1 SMP PREEMPT Tue May 17 16:29:05 PDT 2016

/usr/bin/uname -p = 
/bin/uname -X     = 

hostinfo               = 
/bin/universe          = 
/usr/bin/arch -k       = 
/bin/arch              = 
/usr/bin/oslevel       = 
/usr/convex/getsysinfo = 

UNAME_MACHINE = aarch64
UNAME_RELEASE = 3.10.96-tegra
UNAME_SYSTEM  = Linux
UNAME_VERSION = #1 SMP PREEMPT Tue May 17 16:29:05 PDT 2016
configure: error: cannot guess build type; you must specify one

Syed Tousif Ahmed · Answer 38 · Thu Jul 07 2016 06:15:33 GMT+0800 (China Standard Time)

Anyone knows what farmhash is being used for in tensorflow r0.9? My motivation for installing tensorflow 0.9 on the jetson tx1 is to solely utilize some of the fp16 ops. Hence, if farmhash is not doing anything important, may be I could remove the firmhash related code and build without it. Here is the farmhash commit.

lpralf · Answer 39 · Fri Jul 15 2016 04:18:50 GMT+0800 (China Standard Time)

Temporary sources used in the build process can be found in ~/.cache/bazel. Cd to this directory and search for config.guess: find ./ -name "config.guess".
You might get several files but the paths should give you a clue which config.guess is the one of farmhash. In my case it is ./_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.guess
In this file replace line
UNAME_MACHINE=(uname -m) 2>/dev/null || UNAME_MACHINE=unknown
with
UNAME_MACHINE=armhf

On my machine (Nvidia Shield TV flashed to L4T 23.1) farmhash built successfully after this change.

shingchuang · Answer 40 · Fri Jul 15 2016 09:20:09 GMT+0800 (China Standard Time)

I successfully build the tensorflow on TX1 24.1 64 bit, with the following patch. But, run example failed with following kernel message.
tutorials_examp[31026]: unhandled level 1 translation fault (11) at 0xffffffffffffe8, esr 0x92000005

Maybe farmhard.BUILD with --build=arm-linux-gnu is wrong? But, I failed to compile it with --build=aaarch64-linux-gnu. I'm still trying to figure out what caused the runtime fails.
tx1_patch.zip

Suharsh Sivakumar · Answer 41 · Tue Aug 16 2016 07:06:45 GMT+0800 (China Standard Time)

@benoitsteiner has reenablling variadic templates been verified to work?

alephman · Answer 42 · Tue Aug 16 2016 10:42:13 GMT+0800 (China Standard Time)

@shingchuang have you found the root cause of segmentation fault issue? I have the same problem on aarch64 platform.

Benoit Steiner · Answer 43 · Thu Sep 15 2016 00:45:52 GMT+0800 (China Standard Time)

I tried to reenable variadic templates last night after upgrading the cuda compiler using http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb. This new compiler appears to fix some of the issues but I still get some crashes.

I noticed that nvidia released an even more recent version of the compiler. @maxcuda, is there a debian package that I can use to install the latest version of the cuda sdk ?

Dustin Franklin · Answer 44 · Sun Sep 18 2016 01:28:32 GMT+0800 (China Standard Time)

Re-install / re-flash using JetPack 2.3 because the latest release also updated to Ubuntu 16.04 aarch64 in addition to CUDA 8 and L4T R24.2. The underlying CUDA version is tied to the L4T BSP in JetPack.

sladomic · Answer 45 · Sun Sep 25 2016 14:35:32 GMT+0800 (China Standard Time)

Hi all. I'm trying to build TensorFlow for the Google Pixel C in order to use the GPU TX1. Do you build it on your machine (e.g. Mac) or on the device itself (e.g. Pixel C)? Does anyone have the already generated files for TX1 or can point me in the right direction? Thanks.

dwightcrow · Answer 46 · Fri Oct 07 2016 06:32:10 GMT+0800 (China Standard Time)

Hi all - haven't gotten TensorFlow r0.11 working yet, but do have a working path to r0.9 TensorFlow install on TX1 with JetPack 2.3. Have tested basic nets MLP/LSTM/Conv and seems to work, though it OOMS out pretty easily on bigger convs.

Wrote down all my steps and patches below if it's helpful to anyone. Really appreciated all above commentary was critical to tracking down right path.

http://stackoverflow.com/questions/39783919/tensorflow-on-nvidia-tx1/

gwljf · Answer 47 · Mon Oct 10 2016 02:46:44 GMT+0800 (China Standard Time)

@dwightcrow , I tried your solution, and it works on TX1, thank you. And the version 0.11.0rc0 can be built with bazel with version of 0.3.2

dwightcrow · Answer 48 · Mon Oct 10 2016 03:13:08 GMT+0800 (China Standard Time)

That's fantastic. Bazel 0.3.2 builds fairly easily on TX1?

Sunil S · Answer 49 · Fri Oct 14 2016 00:56:22 GMT+0800 (China Standard Time)

Wondering if there's a concise summary of everything in this issue? It would definitely make it easier for others trying to get TF working on a TX1.

Sunil S · Answer 50 · Fri Oct 21 2016 02:08:18 GMT+0800 (China Standard Time)

Following up on the request for a summary to build tensorflow on a Jetson TX1. Any help is appreciated.

maxcuda · Answer 51 · Fri Oct 21 2016 02:25:02 GMT+0800 (China Standard Time)

The problem is that there are too many moving pieces. Each set of instructions may fail when Bazel/Protobuf/Eigen/TF are updated.

Emanuel Stanciu · Answer 52 · Fri Oct 21 2016 17:03:07 GMT+0800 (China Standard Time)

@dwightcrow Hi Dwight, at some point in the instructions you say:
"Need an edit to recognize aarch64 as ARM"

Can you please expand, edit what? Also, can we update the answer to build the latest version?

I agree with @sunils27 and @maxcuda that we need a more stable set of instructions for specific components..

Thank you very much for the effort and time to support the community.

Sunil S · Answer 53 · Fri Oct 21 2016 21:59:09 GMT+0800 (China Standard Time)

Furthermore, if there is a stable set of build instructions, it becomes accessible to more people who can help in its upkeep when the aforementioned packages (by @maxcuda ) are updated.

Benoit Steiner · Answer 54 · Sat Oct 22 2016 08:31:58 GMT+0800 (China Standard Time)

I've reenabled support for variadic templates on Tegra-X1 provided that one uses JetPack 2.3 (in previous versions nvcc crashes when compiling some of the variadic templates). I haven't tried yet to compile TensorFlow itself but this should reduce the number of code changes necessary to work around the lack of IndexList on Tegra.

Chaz Chandler · Answer 55 · Sat Oct 29 2016 03:51:08 GMT+0800 (China Standard Time)

While a stable set of instructions may remain elusive, one effective way of documenting a working set is to create your own fork of each of the repos and push any changes you need to make as commits on one branch for each version of TF you're targeting. Then in a write-up you can refer to specfic branches / commits that are known to work. You could even go a step further by creating a meta repo which has references to each of those commits; git submodules (as much as I dislike them) are one way, another is using simple scripts to automate what your writeup describes.

In other words: have a personal github fork of basel, tensorflow, etc. and a branch called something like "topic/tf_v0.10" on each fork. Then optionally a new repo altogether which unifies them and a community of folks such as we have on this thread could collaborate to push updates to it as we try different things.

Emanuel Stanciu · Answer 56 · Tue Nov 01 2016 23:30:52 GMT+0800 (China Standard Time)

right, is anyone able to advise where do those changes need to go in the bazel part of the instructions on SstackOverflow? Any help is greatly appreciated. While this doesn't solve the bigger problems with getting tensorflow to work on SATV, it does offer me (and others) the chance to get it going in the current format

Sheng-Yuan, Wang · Answer 57 · Mon Nov 07 2016 17:47:16 GMT+0800 (China Standard Time)

Build tensorflow r0.11 on Nvidia TX1 failed

Error message:

ERROR: .../tensorflow/core/kernels/BUILD:1096:1: C++ compilation of rule '//tensorflow/core/kernels:svd_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
 ...
com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5.4/README.Bugs> for instructions.
Target //tensorflow/cc:tutorials_example_trainer failed to build

My build steps and environment:

Environment

Hardware: Nvidia TX1
OS: JetPack 2.3 (Ubuntu 16.04)
cuDNN:5.1
CUDA: 8

Install Java

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Install some deps

$ sudo apt-get install git zip unzip autoconf automake libtool curl zlib1g-dev maven
$ sudo apt-get install python-numpy swig python-dev python-wheel

Build protobuf

# For grpc-java build
$ git clone https://github.com/google/protobuf.git
$ cd protobuf
$ git checkout master
$ ./autogen.sh
$ git checkout v3.0.0-beta-3
$ ./autogen.sh
$ LDFLAGS=-static ./configure --prefix=$(pwd)/../
$ sed -i -e 's/LDFLAGS = -static/LDFLAGS = -all-static/' ./src/Makefile
$ make -j 4
$ make install


# For bazel build
$ git checkout v3.0.0-beta-2
$./autogen.sh
$ LDFLAGS=-static ./configure --prefix=$(pwd)/../
$ sed -i -e 's/LDFLAGS = -static/LDFLAGS = -all-static/' ./src/Makefile
$ make -j 4
$ cd ..

Build grpc-java compiler

$ git clone https://github.com/neo-titans/odroid.git
$ git clone https://github.com/grpc/grpc-java-git
$ cd grpc-java/
$ git checkout v0.15.0
$ patch -p0 < ../odroid/build_tensorflow/grpc-java.v0.15.0.patch
$ CXXFLAGS="-I$(pwd)/../include" LDFLAGS="-L$(pwd)/../lib" ./gradlew java_pluginExecutable -Pprotoc=$(pwd)/../bin/protoc
$ cd ..

Build bazel

$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ git checkout 0.3.2
$ cp ../protobuf/src/protoc third_party/protobuf/protoc-linux-arm32.exe
$ cp ../grpc-java/compiler/build/exe/java_plugin/protoc-gen-grpc-java third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-arm32.exe

Modify some files for build on aarch64

diff --git a/compile.sh b/compile.sh
index 53fc412..11035d9 100755
--- a/compile.sh
+++ b/compile.sh
@@ -27,7 +27,7 @@ cd "$(dirname "$0")"
 # Set the default verbose mode in buildenv.sh so that we do not display command
 # output unless there is a failure.  We do this conditionally to offer the user
 # a chance of overriding this in case they want to do so.
-: ${VERBOSE:=no}
+: ${VERBOSE:=yes}

 source scripts/bootstrap/buildenv.sh

diff --git a/scripts/bootstrap/compile.sh b/scripts/bootstrap/compile.sh
index 77372f0..657b254 100755
--- a/scripts/bootstrap/compile.sh
+++ b/scripts/bootstrap/compile.sh
@@ -48,6 +48,7 @@ linux)
   else
     if [ "${MACHINE_IS_ARM}" = 'yes' ]; then
       PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-arm32.exe}
+      GRPC_JAVA_PLUGIN=${GRPC_JAVA_PLUGIN:-third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-arm32.exe}
     else
       PROTOC=${PROTOC:-third_party/protobuf/protoc-linux-x86_32.exe}
       GRPC_JAVA_PLUGIN=${GRPC_JAVA_PLUGIN:-third_party/grpc/protoc-gen-grpc-java-0.15.0-linux-x86_32.exe}
@@ -150,7 +151,7 @@ function java_compilation() {

   run "${JAVAC}" -classpath "${classpath}" -sourcepath "${sourcepath}" \
       -d "${output}/classes" -source "$JAVA_VERSION" -target "$JAVA_VERSION" \
-      -encoding UTF-8 "@${paramfile}"
+      -encoding UTF-8 "@${paramfile}" -J-Xmx500M

   log "Extracting helper classes for $name..."
   for f in ${library_jars} ; do
diff --git a/src/main/java/com/google/devtools/build/lib/util/CPU.java b/src/main/java/com/google/devtools/build/lib/util/CPU.java
index 41af4b1..4d80610 100644
--- a/src/main/java/com/google/devtools/build/lib/util/CPU.java
+++ b/src/main/java/com/google/devtools/build/lib/util/CPU.java
@@ -26,7 +26,7 @@ public enum CPU {
   X86_32("x86_32", ImmutableSet.of("i386", "i486", "i586", "i686", "i786", "x86")),
   X86_64("x86_64", ImmutableSet.of("amd64", "x86_64", "x64")),
   PPC("ppc", ImmutableSet.of("ppc", "ppc64", "ppc64le")),
-  ARM("arm", ImmutableSet.of("arm", "armv7l")),
+  ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64")),
   UNKNOWN("unknown", ImmutableSet.<String>of());

   private final String canonicalName;
diff --git a/third_party/grpc/BUILD b/third_party/grpc/BUILD
index 2ba07e3..c7925ff 100644
--- a/third_party/grpc/BUILD
+++ b/third_party/grpc/BUILD
@@ -29,7 +29,7 @@ filegroup(
         "//third_party:darwin": ["protoc-gen-grpc-java-0.15.0-osx-x86_64.exe"],
         "//third_party:k8": ["protoc-gen-grpc-java-0.15.0-linux-x86_64.exe"],
         "//third_party:piii": ["protoc-gen-grpc-java-0.15.0-linux-x86_32.exe"],
-        "//third_party:arm": ["protoc-gen-grpc-java-0.15.0-linux-x86_32.exe"],
+        "//third_party:arm": ["protoc-gen-grpc-java-0.15.0-linux-arm32.exe"],
         "//third_party:freebsd": ["protoc-gen-grpc-java-0.15.0-linux-x86_32.exe"],
     }),
 )
diff --git a/third_party/protobuf/BUILD b/third_party/protobuf/BUILD
index 203fe51..4c2a316 100644
--- a/third_party/protobuf/BUILD
+++ b/third_party/protobuf/BUILD
@@ -28,6 +28,7 @@ filegroup(
         "//third_party:darwin": ["protoc-osx-x86_32.exe"],
         "//third_party:k8": ["protoc-linux-x86_64.exe"],
         "//third_party:piii": ["protoc-linux-x86_32.exe"],
+        "//third_party:arm": ["protoc-linux-arm32.exe"],
         "//third_party:freebsd": ["protoc-linux-x86_32.exe"],
     }),
 )
diff --git a/tools/cpp/cc_configure.bzl b/tools/cpp/cc_configure.bzl
index aeb0715..688835d 100644
--- a/tools/cpp/cc_configure.bzl
+++ b/tools/cpp/cc_configure.bzl
@@ -150,7 +150,12 @@ def _get_cpu_value(repository_ctx):
     return "x64_windows"
   # Use uname to figure out whether we are on x86_32 or x86_64
   result = repository_ctx.execute(["uname", "-m"])
-  return "k8" if result.stdout.strip() in ["amd64", "x86_64", "x64"] else "piii"
+  machine = result.stdout.strip()
+  if machine in ["arm", "armv7l", "aarch64"]:
+   return "arm"
+  elif machine in ["amd64", "x86_64", "x64"]:
+   return "k8"
+  return "piii"


 _INC_DIR_MARKER_BEGIN = "#include <...>"

compile

$ ./compile.sh 
$ cd..

Build Tensorflow

$ git clone https://github.com/tensorflow/tensorflow.git
$ git checkout v0.11.0.rc2

According to StackOverflow's tensorflow-on-nvidia-tx1 to modify

diff --git a/tensorflow/core/kernels/BUILD b/tensorflow/core/kernels/BUILD
index 2e04827..9d81923 100644
--- a/tensorflow/core/kernels/BUILD
+++ b/tensorflow/core/kernels/BUILD
@@ -1184,7 +1184,7 @@ tf_kernel_libraries(
         "segment_reduction_ops",
         "scan_ops",
         "sequence_ops",
-        "sparse_matmul_op",
+        #DC "sparse_matmul_op",
     ],
     deps = [
         ":bounds_check",
diff --git a/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
index 02058a8..880a0c3 100644
--- a/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
+++ b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
@@ -43,8 +43,14 @@ struct BatchSelectFunctor<GPUDevice, T> {
     const int all_but_batch = then_flat_outer_dims.dimension(1);

 #if !defined(EIGEN_HAS_INDEX_LIST)
-    Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
-    Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+    // Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
+    Eigen::array<int, 2> broadcast_dims;
+   broadcast_dims[0] = 1;
+    broadcast_dims[1] = all_but_batch;
+    // Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+    Eigen::Tensor<int, 2>::Dimensions reshape_dims;
+   reshape_dims[0] = batch;
+   reshape_dims[1] = 1;
 #else
     Eigen::IndexList<Eigen::type2index<1>, int> broadcast_dims;
     broadcast_dims.set(1, all_but_batch);
diff --git a/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc b/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
index a177696..28d2f59 100644
--- a/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
@@ -104,9 +104,17 @@ struct SparseTensorDenseMatMulFunctor<GPUDevice, T, ADJ_A, ADJ_B> {
     int n = (ADJ_B) ? b.dimension(0) : b.dimension(1);

 #if !defined(EIGEN_HAS_INDEX_LIST)
-    Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
-    Eigen::array<int, 2> n_by_1{{ n, 1 }};
-    Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+    // Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
+    Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz;
+   matrix_1_by_nnz[0] = 1;
+   matrix_1_by_nnz[1] = nnz;
+    // Eigen::array<int, 2> n_by_1{{ n, 1 }};
+    Eigen::array<int, 2> n_by_1;
+   n_by_1[0] = n;
+   n_by_1[1] = 1;
+    // Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+    Eigen::array<int, 1> reduce_on_rows;
+   reduce_on_rows[0]= 0;
 #else
     Eigen::IndexList<Eigen::type2index<1>, int> matrix_1_by_nnz;
     matrix_1_by_nnz.set(1, nnz);
diff --git a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
index 52256a7..1d027b9 100644
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@@ -888,6 +888,9 @@ CudaContext* CUDAExecutor::cuda_context() { return context_; }
 // For anything more complicated/prod-focused than this, you'll likely want to
 // turn to gsys' topology modeling.
 static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
+// DC - make this clever later. ARM has no NUMA node, just return 0
+LOG(INFO) << "ARM has no NUMA node, hardcoding to return zero";
+return 0;
 #if defined(__APPLE__)
   LOG(INFO) << "OS X does not support NUMA - returning NUMA node zero";
   return 0;

build

$ ./configure
$ bazel build -c opt --jobs 2 --local_resources 1024,4.0,1.0 --config=cuda //tensorflow/tools/pip_package:build_pip_package

References

Tyler Fox · Answer 58 · Tue Nov 08 2016 02:55:04 GMT+0800 (China Standard Time)

@elirex I'm pretty sure you're still running out of memory even with the --local_resources flag. Try adding some swap space

Sheng-Yuan, Wang · Answer 59 · Tue Nov 08 2016 16:35:32 GMT+0800 (China Standard Time)

@tylerfox I tried that, but it doesn't work.

Tyler Fox · Answer 60 · Tue Nov 08 2016 16:44:30 GMT+0800 (China Standard Time)

After enabling the swap space, I'd still opt for more memory and less cpu when building. Try something like bazel build -c opt --local_resources 3072,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

Note this will take several hours to build with these settings, but it's the only way I've found to work. Hope that helps.

XIE Xuan · Answer 61 · Thu Nov 10 2016 15:36:16 GMT+0800 (China Standard Time)

@elirex, Hi, you mentioned "Modify some files for build on aarch64", I don't know which files need to be modified and how?
similar description in tensorflow-on-nvidia-tx1 "Need an edit to recognize aarch64 as ARM".
Thanks!

dwightcrow · Answer 62 · Thu Nov 10 2016 15:52:36 GMT+0800 (China Standard Time)

@ShawnXuan - these are files in the cloned bazel repo. The change proposed on StackOverflow for example would be made to CPU.java as shown in the diff. You can see which files elirex changed in addition by looking at their diff. Hope that helps

Piotr Chmiel · Answer 63 · Sat Nov 12 2016 21:26:45 GMT+0800 (China Standard Time)

@elirex Did you manage to compile ?

Sheng-Yuan, Wang · Answer 64 · Mon Nov 14 2016 11:17:50 GMT+0800 (China Standard Time)

@piotrchmiel Yes, I successfully completed the compilation. I add 8GB swap space and run bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

At compiling, I through free -h and top command to look the memory usage status. Tensorflow need to use about 8GB memory to compile.

Piotr Chmiel · Answer 65 · Tue Nov 15 2016 02:37:52 GMT+0800 (China Standard Time)

Thank you 👍 I will try to repeat your steps :-)

Matt Kleinsmith · Answer 66 · Fri Nov 25 2016 16:02:58 GMT+0800 (China Standard Time)

Question:

For those that compiled TensorFlow 0.9 on the Jetson TX1, which options did you use during the TensorFlow ./configure step?

Error 1:

I received Error: unexpected EOF from Bazel server after following the steps from this StackOverflow guide from a fresh install of JetPack 2.3.

Two bazel issue responders (1, 2) suggested people use the --jobs 4 or --jobs 20 option when receiving this error, in case the error was due to a lack of memory.

I'm ran bazel again, this time with the --jobs 4; however, I received a new error ("Error 2", below).

The remainder of the error said, Contents of /home/ubuntu/.cache/bazel/_bazel_ubuntu/(xxxx)/server/jvm.out':` with no further output.

Error 2:

ERROR: /home/ubuntu/tensorflow/tensorflow/core/kernels/BUILD:309:1: C++ compilation of rule '//tensorflow/core/kernels:mirror_pad_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 105 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4. gcc: internal compiler error: Killed (program cc1plus)

I didn't use bazel clean --expunge before the second attempt. Maybe that caused the error.

Plan:

Run bazel clean --expunge
Rerun bazel to create the cache folder
Readd config.guess and config.sub to the cache folder
Create 8GB of swap space
Try bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package because @elirex had success with it.

Matt Kleinsmith · Answer 67 · Fri Nov 25 2016 23:17:54 GMT+0800 (China Standard Time)

It worked

Following this StackOverflow guide but with an 8 GB swap file and using the following command successfully built TensorFlow 0.9 on the Jetson TX1 from a fresh install of JetPack 2.3:

bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

I used the default settings for TensorFlow's ./configure script except to enable GPU support.

My build took at least 6 hours. It'll be faster if you use an SSD instead of a USB drive.

Thanks to Dwight Crow, @elirex, @tylerfox, everyone that helped them, and everyone in this thread for spending time on this problem.

Creating a swap file

# Create a swapfile for Ubuntu at the current directory location
fallocate -l *G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s

Adapted from JetsonHack's gist.

I used this USB drive to store my swap file.

The most memory I saw my system use was 7.7 GB (3.8 GB on Mem and 3.9 GB on Swap). The most swap memory I saw used at once was 4.4 GB. I used free -h to view memory usage.

Creating the pip package and installing

Adapted from the TensorFlow docs:

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl

Sheng-Yuan, Wang · Answer 68 · Tue Nov 29 2016 21:35:17 GMT+0800 (China Standard Time)

I use bazel build -c opt --local_resources 1024,4.0,1.0 --jobs 4 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package and without allocate swap to build tensorflow r.09 on the TX1 JetPack 2.3 that pass compilation.

Yusu Pan · Answer 69 · Sun Dec 04 2016 12:38:26 GMT+0800 (China Standard Time)

Could anyone build TF r0.11 on TX1 yet?

Xiang Zhu · Answer 70 · Mon Dec 05 2016 13:34:56 GMT+0800 (China Standard Time)

Thanks for all the information here, got tensorflow r0.11.0 installed with Jetpack 2.3.1 on tx1. Following @elirex 's steps, make sure using the exact version of protobuf, grpc and bazel. I build tensorflow r0.11.0 instead of v0.11.0.rc2. When compiling, following @MatthewKleinsmith 's step to add swap file, you need a big swap, I tried 6G but failed in the middle with out of memory error, tried again with 10G swap file works. It took me about 5 hours for the compiling with swapfile allocated on usb drive.

Oliver Hennigh · Answer 71 · Wed Dec 07 2016 23:37:47 GMT+0800 (China Standard Time)

Is tensorflow working correctly on the TX1, ie. able to run inference and get good results? When I installed tensorflow on a TK1 it ran just fine however the convolutional layers where producing bad results. I could train fully connected models on mnist just fine but when I tried to use conv layers it stopped converging. Is this problem persistent in the TX1 build?

Tug Witt · Answer 72 · Thu Dec 08 2016 10:09:13 GMT+0800 (China Standard Time)

Continually get this when running ./compile.sh for Bazel:
Building Bazel from scratch
gPRC Java plugin not found in

If I pull 0.2.3 I don't get the error, only with 0.3.x

Arek Sredzki · Answer 73 · Thu Dec 15 2016 04:31:07 GMT+0800 (China Standard Time)

@zxwind How is TF 0.11 performance working for you on the TX1?

Ross Wightman · Answer 74 · Tue Jan 17 2017 09:53:54 GMT+0800 (China Standard Time)

FYI, I've got a branch off r1.0 with some hacks to build the r1.0 release on TX1 with Jetpack 2.3.1.

In addition to the previously mentioned issues, there is a change in Eigen after the revision used on the TF r0.11 branch that causes the CUDA compiler to crash with an internal error. I changed workspace.bzl on r1.0 branch to point to the older Eigen revision. In order for that to build I had to remove the EXPM1 op that was added after r0.11. It's all rather ugly but got me up and running.

Interesting to note, with the r1.0.0a build I'm able to run inference on a Resnet50 based network at 128x96 resolution that was running out of memory on r0.11. For anyone curious on benchmark numbers, was getting approx 15fps with single frame batches.

Link to a tag on my clone of TF with binary wheels for anyone interested. The wheels will likely only work on a Jetpack 2.3.1 (L4T 24.2.1). No guarantees there aren't some serious issues but I've verified results on the networks I'm using right now.
https://github.com/rwightman/tensorflow/releases/tag/v1.0.0-alpha-tegra-ugly_hack

Patrick Nguyen · Answer 75 · Tue Jan 24 2017 10:01:03 GMT+0800 (China Standard Time)

Closing since @rwightman / @MatthewKleinsmith solution seems to work, though not quite a seamless out-the-box experience. Feel free to reopen.

Markus Mayer · Answer 76 · Sun Feb 19 2017 09:19:41 GMT+0800 (China Standard Time)

@rwightman May I humbly ask you to provide another wheel for the r1.0 stable version?

sumitkamath · Answer 77 · Fri Feb 24 2017 03:15:19 GMT+0800 (China Standard Time)

@rwightman How were you able to build tensorflow without gRPC? Thanks!

Edit: never mind, I saw your repo : https://github.com/jetsonhacks/installTensorFlowTX1/

Thanks for setting that up.

Bartol Freškura · Answer 78 · Fri Mar 17 2017 04:34:13 GMT+0800 (China Standard Time)

@sunsided Here's the Python 3.5.2 version for TF 1.0.1 that @dkopljar and I managed to build: https://drive.google.com/open?id=0B2jw9AHXtUJ_OFJDV19TWTEyaWc

Syed Tousif Ahmed · Answer 79 · Thu Mar 23 2017 01:09:53 GMT+0800 (China Standard Time)

Hello all, I was able to install TensorFlow v1.0.1 on the new Jetson TX2. I had to follow similar process as mentioned above in this thread (protobuf, grpc, swapfile etc). For bazel, I downloaded bazel-0.4.5-dist.zip and applied @dtrebbien's change. Here is the pip wheel of my installation if it helps anyone. It's for Python 2.7: https://drive.google.com/file/d/0Bxl-G9VJ61mBYmZPY0hLSlFaUDg/view?usp=sharing
And here the step by step procedure: https://syed-ahmed.gitbooks.io/nvidia-jetson-tx2-recipes/content/first-question.html

Mohamed Abdallah · Answer 80 · Mon Jul 03 2017 19:33:20 GMT+0800 (China Standard Time)

Hello all, I was able to install TensorFlow v1.0.1 on Tegra X1 using the build by @Barty777
Is there build availabke for TensorFlow v1.2 ?

Graham Voysey · Answer 81 · Wed Aug 09 2017 05:27:39 GMT+0800 (China Standard Time)

@Barty777 you wouldn't happen to have 3.6 wheels, would you? 🙏

Bartol Freškura · Answer 82 · Wed Aug 09 2017 05:51:27 GMT+0800 (China Standard Time)

@gvoysey Unfortunately no. :(

Berscheid · Answer 83 · Thu Aug 24 2017 00:31:10 GMT+0800 (China Standard Time)

Here is the wheel file for TensorFlow 1.2, Nvidia TX1 and Python 2.7: https://drive.google.com/file/d/0B-Ljdh8jFZRbTnVNdGtGMHA2Ymc/view?usp=sharing

Graham Voysey · Answer 84 · Thu Aug 24 2017 00:37:55 GMT+0800 (China Standard Time)

i've been able to build a tensorflow wheel for python 3.6 for TX1, but i cannot build tensorflow-GPU support successfully. See https://stackoverflow.com/questions/45825708/error-building-tensorflow-gpu-1-1-0-on-nvidia-jetson-tx1-aarch64 for details.

Saisankar Gochhayat · Answer 85 · Sat Jan 27 2018 14:34:27 GMT+0800 (China Standard Time)

Sorry for the late comment, can anyone please help me regarding setting up tensorflow in Nvidia tk1?