ROCm / rccl-tests

RCCL Performance Benchmark Tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Issue]: 'mpi.h' file not found during rccl-tests build

jzhang82119 opened this issue · comments

Problem Description

I am following some old notes, not sure if I am doing anything wrong here. I got mpi.h not found error.
MPICH=1 make NCCL_HOME=/home/jinzhang/rccl/build CUSTOM_RCCL_LIB=/home/jinzhang/rccl/build/librccl.so

make -C src build BUILDDIR=/home/jinzhang/rccl-tests/build
make[1]: Entering directory '/home/jinzhang/rccl-tests/src'
Compiling timer.cc > /home/jinzhang/rccl-tests/build/timer.o
Hipifying ../verifiable/verifiable.cu > /home/jinzhang/rccl-tests/build/hipify/verifiable.cu.cpp
Hipifying ../verifiable/verifiable.h > /home/jinzhang/rccl-tests/build/hipify/verifiable.h
Hipifying ../verifiable/../src/rccl_float8.h > /home/jinzhang/rccl-tests/build/hipify/rccl_float8.h
Compiling /home/jinzhang/rccl-tests/build/verifiable/verifiable.o
/opt/rocm/bin/hipcc -o /home/jinzhang/rccl-tests/build/verifiable/verifiable.o -std=c++14 -I/home/jinzhang/r ccl/build/ -I/home/jinzhang/rccl/build/include -I/opt/rocm/include -I/opt/rocm/include/hip -O3 -DMPI_SUPPORT -I/usr/include/mpich -I/usr/include/x86_64-linux-gnu/mpich -c /home/jinzhang/rccl-tests/build/hipify/verifiable.cu.cpp
Hipifying all_reduce.cu > /home/jinzhang/rccl-tests/build/hipify/all_reduce.cu.cpp
Hipifying common.h > /home/jinzhang/rccl-tests/build/hipify/common.h
Compiling /home/jinzhang/rccl-tests/build/hipify/all_reduce.cu.cpp > /home/jinzhang/rccl-tests/build/all_reduce.o
/opt/rocm/bin/hipcc -o /home/jinzhang/rccl-tests/build/all_reduce.o -std=c++14 -I/home/jinzhang/rccl/build/ - I/home/jinzhang/rccl/build/include -I/opt/rocm/include -I/opt/rocm/include/hip -O3 -DMPI_SUPPORT -I/usr/inclu de/mpich -I/usr/include/x86_64-linux-gnu/mpich -I. -c /home/jinzhang/rccl-tests/build/hipify/all_reduce.cu.cpp
In file included from /home/jinzhang/rccl-tests/build/hipify/all_reduce.cu.cpp:9:
/home/jinzhang/rccl-tests/build/hipify/common.h:16:10: fatal error: 'mpi.h' file not found
16 | #include "mpi.h"
| ^~~~~~~

1 error generated when compiling for gfx942.
failed to execute:/opt/rocm-6.2.0/lib/llvm/bin/clang++ --offload-arch=gfx942 --offload-arch=gfx942 --offload -arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offloa d-arch=gfx942 -o "/home/jinzhang/rccl-tests/build/all_reduce.o" -std=c++14 -I/home/jinzhang/rccl/build/ -I/h ome/jinzhang/rccl/build/include -I/opt/rocm/include -I/opt/rocm/include/hip -O3 -DMPI_SUPPORT -I/usr/include/ mpich -I/usr/include/x86_64-linux-gnu/mpich -I. -c -x hip /home/jinzhang/rccl-tests/build/hipify/all_reduce.c u.cpp
make[1]: *** [Makefile:98: /home/jinzhang/rccl-tests/build/all_reduce.o] Error 1
make[1]: Leaving directory '/home/jinzhang/rccl-tests/src'
make: *** [Makefile:20: src.build] Error 2

Operating System

OS: NAME="Ubuntu" VERSION="22.04.3 LTS (Jammy Jellyfish)" CPU: model name : Intel(R) Xeon(R) Platinum 8480C GPU: Name: Intel(R) Xeon(R) Platinum 8480C Marketing Name: Intel(R) Xeon(R) Platinum 8480C Name: Intel(R) Xeon(R) Platinum 8480C Marketing Name: Intel(R) Xeon(R) Platinum 8480C Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Radeon Graphics Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-

CPU

Intel(R) Xeon(R) Platinum 8480C

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response