elixir-nx / xla

Pre-compiled XLA extension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

build for rocm not working

aschiavon91 opened this issue · comments

Hello everyone
I'm trying to build xla targeting rocm, but I'm getting bazel some errors, requesting rocrand installation, but I'm already installed everything needded for rocm, I'm most sure that everything is installed correctly (stable diffusion is working from python code).

Error on build

$ XLA_TARGET=rocm XLA_BUILD=true mix deps.compile xla  
==> xla
rm -f /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/tensorflow/compiler/xla/extension && \
	ln -s "/home/aschiavon/Projetos/elixir/livebook/deps/xla/extension" /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/tensorflow/compiler/xla/extension && \
	cd /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374 && \
	bazel build --define "framework_shared_object=false" -c opt   --config=rocm --action_env=HIP_PLATFORM=hcc //tensorflow/compiler/xla/extension:xla_extension && \
	mkdir -p /home/aschiavon/.cache/xla/0.4.1/cache/build/ && \
	cp -f /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /home/aschiavon/.cache/xla/0.4.1/cache/build/xla_extension-x86_64-linux-rocm.tar.gz
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:rocm in file /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --copt=-Wno-error=unused-result
INFO: Found applicable config definition build:linux in file /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-unknown-warning --copt=-Wno-array-parameter --copt=-Wno-stringop-overflow --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading: 
Loading: 0 packages loaded
INFO: Repository local_config_rocm instantiated at:
  /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/WORKSPACE:15:14: in <toplevel>
  /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/tensorflow/workspace2.bzl:928:19: in workspace
  /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/tensorflow/workspace2.bzl:99:19: in _tf_toolchains
Repository rule rocm_configure defined at:
  /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl:888:33: in <toplevel>
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/4ce3e4da2e21ae4dfcee9366415e55f408c884ec.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
ERROR: An error occurred during the fetch of repository 'local_config_rocm':
   Traceback (most recent call last):
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 869, column 38, in _rocm_autoconf_impl
		_create_local_rocm_repository(repository_ctx)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 547, column 35, in _create_local_rocm_repository
		rocm_config = _get_rocm_config(repository_ctx, bash_bin, find_rocm_config_script)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 395, column 30, in _get_rocm_config
		config = find_rocm_config(repository_ctx, find_rocm_config_script)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 373, column 41, in find_rocm_config
		exec_result = _exec_find_rocm_config(repository_ctx, script_path)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 369, column 19, in _exec_find_rocm_config
		return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/remote_config/common.bzl", line 230, column 13, in execute
		fail(
Error in fail: Repository command failed
ERROR: #define "ROCRAND_VERSION" is either
  not present in file /opt/rocm/rocrand/include/rocrand_version.h OR
  its value is not an integer literal
ERROR: /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/WORKSPACE:15:14: fetching rocm_configure rule //external:local_config_rocm: Traceback (most recent call last):
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 869, column 38, in _rocm_autoconf_impl
		_create_local_rocm_repository(repository_ctx)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 547, column 35, in _create_local_rocm_repository
		rocm_config = _get_rocm_config(repository_ctx, bash_bin, find_rocm_config_script)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 395, column 30, in _get_rocm_config
		config = find_rocm_config(repository_ctx, find_rocm_config_script)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 373, column 41, in find_rocm_config
		exec_result = _exec_find_rocm_config(repository_ctx, script_path)
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/gpus/rocm_configure.bzl", line 369, column 19, in _exec_find_rocm_config
		return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
	File "/home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/third_party/remote_config/common.bzl", line 230, column 13, in execute
		fail(
Error in fail: Repository command failed
ERROR: #define "ROCRAND_VERSION" is either
  not present in file /opt/rocm/rocrand/include/rocrand_version.h OR
  its value is not an integer literal
INFO: Repository bazel_skylib instantiated at:
  /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/WORKSPACE:11:14: in <toplevel>
  /home/aschiavon/.cache/xla_extension/tf-d5b57ca93e506df258271ea00fc29cf98383a374/tensorflow/workspace3.bzl:21:17: in workspace
Repository rule http_archive defined at:
  /home/aschiavon/.cache/bazel/aschiavon/d6a7ff46c87e6aa8aee90413457de5d7/external/bazel_tools/tools/build_defs/repo/http.bzl:355:31: in <toplevel>
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/protocolbuffers/upb/archive/9effcbcb27f0a665f9f345030188c0b291e32482.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
ERROR: Skipping '//tensorflow/compiler/xla/extension:xla_extension': no such package '@local_config_rocm//rocm': Repository command failed
ERROR: #define "ROCRAND_VERSION" is either
  not present in file /opt/rocm/rocrand/include/rocrand_version.h OR
  its value is not an integer literal
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_rocm//rocm': Repository command failed
ERROR: #define "ROCRAND_VERSION" is either
  not present in file /opt/rocm/rocrand/include/rocrand_version.h OR
  its value is not an integer literal
INFO: Elapsed time: 0.211s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)
make: *** [Makefile:27: /home/aschiavon/.cache/xla/0.4.1/cache/build/xla_extension-x86_64-linux-rocm.tar.gz] Erro 1
could not compile dependency :xla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile xla", update it with "mix deps.update xla" or clean it with "mix deps.clean xla"
==> livebook
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install 'Development Tools'".

In the file pointed in the error, have this content:

$ cat /opt/rocm/rocrand/include/rocrand_version.h
/*
    Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
*/

#ifndef ROCM_WRAPPER_ROCRAND_VERSION_H
#define ROCM_WRAPPER_ROCRAND_VERSION_H

#if defined(ROCM_NO_WRAPPER_HEADER_WARNING) || defined(ROCM_WRAPPER_GAVE_WARNING)
/* include file */
#include "../../include/rocrand/rocrand_version.h"
#else
/* give warning */
#if defined(_MSC_VER)
#pragma message(": warning:This file is deprecated. Use the header file from /opt/rocm-5.4.0/include/rocrand/rocrand_version.h by using #include <rocrand/rocrand_version.h>")
#elif defined(__GNUC__)
#pragma message(": warning : This file is deprecated. Use the header file from /opt/rocm-5.4.0/include/rocrand/rocrand_version.h by using #include <rocrand/rocrand_version.h>")
#endif
/* include file */
#define ROCM_WRAPPER_GAVE_WARNING
#include "../../include/rocrand/rocrand_version.h"
#undef ROCM_WRAPPER_GAVE_WARNING
#endif /* defined(ROCM_NO_WRAPPER_HEADER_WARNING) || defined(ROCM_WRAPPER_GAVE_WARNING) */

#endif /* ROCM_WRAPPER_ROCRAND_VERSION_H *

Anyone have some idea?

@seanmor5 looks related to me, I'm current using ROCM 5.4, I will try to build using older versions to see what happens

@seanmor5 confirmed it's related. But, now I'm trying to figure out, what changes/configs I should make to build it from here.
In my last try, I used the last commit from the PR that you pointed out, but without success, looks like it's trying to build with cuda support.

Just for info, in the meantime, I was able to build TensorFlow from the ROCM fork, ref: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/tree/r2.11-rocm-enhanced

There is the error I'm getting now:

ROCM_PATH=/opt/rocm-5.4.0 TF_ROCM_FUSION_ENABLE=1 XLA_TARGET=rocm XLA_BUILD=true mix compile
rm -f /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/tensorflow/compiler/xla/extension && \
	ln -s "/home/aschiavon/Projetos/elixir/xla/extension" /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/tensorflow/compiler/xla/extension && \
	cd /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce && \
	bazel build --define "framework_shared_object=false" -c opt   --config=rocm --action_env=HIP_PLATFORM=amd //tensorflow/compiler/xla/extension:xla_extension && \
	mkdir -p /home/aschiavon/.cache/xla/0.4.1/cache/build/ && \
	cp -f /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /home/aschiavon/.cache/xla/0.4.1/cache/build/xla_extension-x86_64-linux-rocm.tar.gz
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:rocm in file /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --copt=-Wno-error=unused-result
INFO: Found applicable config definition build:linux in file /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-unknown-warning --copt=-Wno-array-parameter --copt=-Wno-stringop-overflow --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading: 
Loading: 0 packages loaded
Analyzing: target //tensorflow/compiler/xla/extension:xla_extension (1 packages loaded, 0 targets configured)
INFO: Analyzed target //tensorflow/compiler/xla/extension:xla_extension (2 packages loaded, 1050 targets configured).
INFO: Found 1 target...
[0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[399 / 403] Linking tensorflow/compiler/xla/extension/libxla_extension.so; 1s local ... (2 actions running)
ERROR: /home/aschiavon/.cache/xla_extension/tf-28b97bad94b65f8e743b87f80948cb44f95bacce/tensorflow/compiler/xla/extension/BUILD:11:10: Linking tensorflow/compiler/xla/extension/libxla_extension.so failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/tensorflow/compiler/xla/extension/libxla_extension.so-2.params
/usr/bin/ld.gold: error: bazel-out/k8-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Ucuda_Scuda_Slib/libcudart.so: file is empty
collect2: error: ld returned 1 exit status
Target //tensorflow/compiler/xla/extension:xla_extension failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 6.884s, Critical Path: 6.26s
INFO: 5 processes: 5 internal.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
make: *** [Makefile:27: /home/aschiavon/.cache/xla/0.4.1/cache/build/xla_extension-x86_64-linux-rocm.tar.gz] Erro 1
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install '

Oh I think the issue is we always try to include cuda_platform in the build even if we are building for ROCm, I think we need to update the BUILD file to conditionally include the CUDA platform or ROCm platform based on the build configuration

Can you try to build from this branch: https://github.com/elixir-nx/xla/tree/fix-rocm

It worked, thank you a lot!
The only change a had to make is the TENSORFLOW_GIT_REV, as I already said, probably it will be fixed in the next release of TF.

That's great! And you are able to use the ROCm GPU with EXLA without any issues?

not really, I'm getting some exla error, and can't really understand what's going on, here's the stacktrace https://pastebin.com/czX286tz
Edit: should I open an issue on exla?

They must have moved the types from the Tensorflow namespace, I can't seem to find where they are in the TF source anymore

Oh, this is sad, probably something related to the next version? Has something that I can do for now?

You just need to find declarations in the TF source for the different datatypes and change the EXLA NIF to use those instead. It should be an easy fix after finding the type declarations!

what about this tsl project?
I'm able to successfully compile using it, now I'm trying to run, but my gpu is not fully supported, so I'm trying apply the navi10 hack ref

the hack actually is just set the envvar HSA_OVERRIDE_GFX_VERSION=10.3.0

Oh, so did you replace the types from using int8 = tensorflow::int8 to using int8 = tsl::int8 ? that will probably work

yes, and also I need to change this line from xla::Status::OK() to tsl::OkStatus()
I will open a pr to the NX repo with these changes.

This should be solved in latest XLA. We still need to release a new EXLA, which will happen soon.