bsc-pm / nanox

Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.

Home Page:https://pm.bsc.es/nanox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nanox cluster runtime error

AminSahebi opened this issue · comments

commented

Hi there,
I'm using ompss on ARM-based cluster (udoo boards) and everything went well (using the AXIOM Project materials).
Once I'm trying to port the same practice on a different cluster board which is X86_64 based, I faced a runtime error which is below:

Executing matrix multiplication on 8 boards...
WARNING: [?]plugin error=/home/udoo/nanox-install/lib/performance/libnanox-pe-cluster-mpi.so: undefined symbol: ompi_mpi_op_sum
terminate called after throwing an instance of 'nanos::FatalError'
what(): FATAL ERROR: [-1] Couldn't load Cluster support
matmul/matmul.sh: line 7: 28627 Aborted (core dumped) ~/matmul/dgemm_onelevel.perf
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[62796,1],0]
Exit code: 134

I guess there is something related to openmpi configuration flags, but I'm not sure, below is the flags that I used to compile the repositories and the specification of the platform I have used :
Platform: ubuntu 16.04 (and also ubuntu 18.04 the same problem)
First installing openmpi: (I have tried both procedure and no success)
first try: sudo apt install openmpi-bin openmpi-bin openmpi-dev

second: cd ~/openmpi-1.10.2/./configure --enable-mpi-threads -> make- make install

then:

./configure --prefix=/home/$USER/gasnet-install --disable-aligned-segments --disable-pshm --disable-seq --disable-parsync --with-mpi-cc="mpicc -fPIC -DPIC" --with-mpi-cxx="mpicxx -fPIC -DPIC" CC="gcc -fPIC -DPIC" CFLAGS="-fPIC -DPIC" CXX="g++ -fPIC -DPIC" CXXFLAGS="-fPIC -DPIC" CPPFLAGS="-DPIC" LDFLAGS="-fPIC" --enable-mpi --enable-udp --enable-smp --disable-ibv
make
make install
then:

./configure --prefix=/home/$USER/nanox-install --with-gasnet=/home/$USER/gasnet-install --disable-debug --with-mpi-include=/usr/include/mpi --with-mpi-lib=/usr/lib MPICXX=mpicxx
make
make install
then:
configure --prefix=/home/$USER/mcxx-install --enable-ompss --enable-tl-openmp-nanox --with-nanox=/home/$USER/nanox-install
make
make install
I exported the PATH, like PATH=/home/udoo/nanox-install/bin:/home/udoo/gasnet-install/bin:/home/udoo/mcxx-install/bin:$PATH and exported the LD_LIBRARY_PATH, I've checked the openmpi linked libraries, shown below:

$mpicxx -show
$g++ -I/usr/local/include -pthread -Wl,-rpath -Wl,/usr/local/lib -Wl,--enable-new-dtags -L/usr/local/lib -lmpi_cxx -lmpi

$mpicc -show
$gcc -I/usr/local/include -pthread -Wl,-rpath -Wl,/usr/local/lib -Wl,--enable-new-dtags -L/usr/local/lib -lmpi
$
ldd /usr/bin/mpicc.openmpi
linux-vdso.so.1 => (0x00007ffdf031f000)
libopen-pal.so.13 => /usr/local/lib/libopen-pal.so.13 (0x00007f4351616000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f43513f9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f435102f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4350e2b000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4350c23000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f4350a20000)
/lib64/ld-linux-x86-64.so.2 (0x00007f43518f9000)

after all, tried to run MatrixMultiplication provided by BSC, I faced the runtime error as can be seen above,

I really appreciate any help.

thanks