Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana University Research and Technology Corporation. All rights reserved. Copyright (c) 2004-2007 The University of Tennessee and The University of Tennessee Research Foundation. All rights reserved. Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, University of Stuttgart. All rights reserved. Copyright (c) 2004-2007 The Regents of the University of California. All rights reserved. Copyright (c) 2006-2018 Cisco Systems, Inc. All rights reserved. Copyright (c) 2006-2011 Mellanox Technologies. All rights reserved. Copyright (c) 2006-2012 Oracle and/or its affiliates. All rights reserved. Copyright (c) 2007 Myricom, Inc. All rights reserved. Copyright (c) 2008-2019 IBM Corporation. All rights reserved. Copyright (c) 2010 Oak Ridge National Labs. All rights reserved. Copyright (c) 2011 University of Houston. All rights reserved. Copyright (c) 2013-2017 Intel, Inc. All rights reserved. Copyright (c) 2015 NVIDIA Corporation. All rights reserved. Copyright (c) 2017-2018 Los Alamos National Security, LLC. All rights reserved. Copyright (c) 2017 Research Organization for Information Science and Technology (RIST). All rights reserved. Copyright (c) 2019 Triad National Security, LLC. All rights reserved. Copyright (c) 2020 Huawei Technologies Co.,Ltd. All rights reserved. # HMPI README - [Introduction] - [License] - [Requirements] - [Installation Instructions] - [Installation(Source)] - [Installation(Binary)] - [Testing HMPI] - [Upgrade] - [Examples] ## Introduction HMPI is developed based on ompi and provides an adapter layer for HUCX in which collective algorithms are implemented. About algorithms, refer to 'Introduction' in HUCX for details. Based on performance considerations, HMPI **DO NOT** provide the functionalities related to transmission security. ## License HMPI is licensed as BSD3. ## Requirements * Operating System: * CentOS 7.6 4.14.0-115.el7a.0.1.aarch64 version * gnu: * gnu7 * gnu8 * gnu9 ## Installation Instructions ### Installation (Source) Clone HMPI from Github at the following location: ``` git clone https://github.com/kunpengcompute/hmpi.git ``` By default, we usually install HMPI as follows: ``` cd /where/to/download/ompi touch opal/mca/btl/uct/.opal_ignore ./autogen.pl ./configure --prefix=/where/to/install/ompi --with-platform=contrib/platform/mellanox/optimized --enable-mpi1-compatibility --with-ucx=/where/to/install/ucx make make install ``` export /where/to/install/ompi/bin to PATH and export /where/to/install/ompi/lib to LD_LIBRARY_PATH. ### Installation (Binary) Install the binary as follows: Decompress the binary package to /where/to/install, then export environment variables in bashrc. ``` hwmpi=/where/to/install export OPAL_PREFIX=${hwmpi}/ompi/ export PATH=${hwmpi}/ompi/bin:${hwmpi}/ucx/bin:$PATH export LD_LIBRARY_PATH=${hwmpi}/ompi/lib:${hwmpi}/ucx/lib:$LD_LIBRARY_PATH export INCLUDE=${hwmpi}/ompi/include:$INCLUDE ``` Finally make bashrc become effective. ### Testing HMPI Write C code to a file named 'example.c'. ``` #include <mpi.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <stdbool.h> #include <math.h> int main(int agrc, char *argv[]) { MPI_Init(&agrc, &argv); int rank, size; int root = 0; int x; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); int count = 2; int *buf = malloc(count * sizeof(int)); if (rank == root) { for (x = 0; x < count; x++) { buf[x] = x; } } MPI_Bcast(buf, count, MPI_INT, root, MPI_COMM_WORLD); printf("[%d]buf: %d %d\n", rank, buf[0], buf[1]); free(buf); MPI_Finalize(); return 0; } ``` Compile 'example.c' using: ``` mpicc example.c -o example ``` Run example using: ``` mpirun -np 4 -x UCX_BUILTIN_BCAST_ALGORITHM=2 example ``` Console shows something like(regardless of orders): [1]buf: 0 1 [0]buf: 0 1 [2]buf: 0 1 [3]buf: 0 1 ## Upgrade Remove the installation directory: ``` rm -rf /where/to/install/ompi ``` Then refer to 'Installation Instructions' and install again. ## Examples Here is an example to show you how to use the HMPI: ``` mpirun -np 4 ./executable ``` The independently developed algorithms will be used by default. We recommend that use algorithm 8 for allreduce , use algorithm 7 for barrier and use algorithm 4 for bcast, as follows: ``` mpirun -np 4 -x UCX_BUILTIN_ALLREDUCE_ALGORITHM=8 --map-by socket --rank-by core ./executable mpirun -np 4 -x UCX_BUILTIN_BARRIER_ALGORITHM=7 --map-by socket --rank-by core ./executable mpirun -np 4 -x UCX_BUILTIN_BCAST_ALGORITHM=4 ./executable ``` For other detailed usage, refer to ompi github(https://github.com/open-mpi/ompi).