bholt / mpich2-yarn

Running MPICH2 on Yarn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool



You can use the three ways to get mpich2-install.tar.gz and mpich2-yarn-1.0-SNAPSHOT.jar.

##Compile the hacked MPICH2

./configure --prefix=/home/<USERNAME>/mpich2-install  --with-pm=smpd --with-pmi=smpd
make install
tar -zcf mpich2-install.tar.gz /home/<USERNAME>/mpich2-install

##Compile mpich2-yarn

mvn clean package

##Compile MPICH2 and mpich2-yarn together

mvn clean package -Dmaven.test.skip=true -DskipMpi=false

This command will generate mpich2-install.tar.gz and mpich2-yarn-1.0-SNAPSHOT.jar together.


##Deploy hacked MPICH2

Distribute mpich2-install.tar.gz to the EVERY node of the cluster including both client notes and node mangers and extract it.

Client nodes will use the mpich2 for compiling, if you don not want to compile on the client notes, it is not necessary to put upload the mpich2.

Assuming that the extracted path looks like this:

/home/<USERNAME>/mpich2-install  # where <USERNAME> can be "hadoop" or something

Make sure the mpich2 is on the PATH of the Nodemanager, ususally we simply add this line to the or the system environment of the Nodemanager:

export PATH=/path/to/mpich2-install/bin:$PATH

##Deploy mpich2-yarn

Upload mpich2-yarn-1.0-SNAPSHOT.jar to the client nodes.

Assuming that the path looks like this:


All the ApplicationMaster, Container and Client are packed in this yarn, and it will be distributed automatically while summiting an MPI Application.


##Client nodes

Configure the following environment:

export HADOOP_HOME=/home/<USERNAME>/hadoop-current
export HADOOP_CONF_DIR=/home/<USERNAME>/hadoop-conf
export MPI_HOME=/home/<USERNAME>/mpich2-install

Enter $HADOOP_CONF_DIF and create mpi-site.xml:

<?xml version="1.0"?>

The hdfs path will save the mpi works and appmasters.

##Node Managers

Put the following lines the on the path of the node managers.

export MPI_HOME=/home/<USERNAME>/mpich2-install
export PATH=$MP_HOME/bin

#Submit Jobs


On the client nodes:

mpicc -o cpi cpi.c
hadoop jar mpich2-yarn-1.0-SNAPSHOT.jar -a cpi -M 1024 -m 1024 -n 2

Hello world

hadoop jar mpich2-yarn-1.0-SNAPSHOT.jar -a hellow -M 1024 -m 1024 -n 2


svn checkout plda  # Prepare source code
cd plda
make  # call mpicc to compile
cd ..

Put the input data to the hdfs (P.S. there is a testdata in the PLDA source code dir):

hadoop fs -mkdir /group/dc/zhuoluo.yzl/plda\_input
hadoop fs -put plda/testdata/test\_data.txt /group/dc/zhuoluo.yzl/plda\_input/
hadoop jar mpich2-yarn-1.0-SNAPSHOT.jar -a plda/mpi\_lda -M 1024 -m 1024 -n 2\
 -o "--num_topics 2 --alpha 0.1 --beta 0.01 --training_data_file MPIFILE1 --model_file MPIOUTFILE1 --total_iterations 150"\
 -DMPIFILE1=/group/dc/zhuoluo.yzl/plda_input -SMPIFILE1=true -OMPIOUTFILE1=/group/dc/zhuoluo.yzl/lda_model_output.txt -ppc 2


Running MPICH2 on Yarn


Language:C 71.7%Language:Shell 10.1%Language:Java 6.8%Language:C++ 3.4%Language:Perl 2.9%Language:Fortran 1.8%Language:Python 1.7%Language:TeX 0.9%Language:C# 0.6%Language:Objective-C 0.0%Language:XSLT 0.0%Language:CSS 0.0%Language:Pure Data 0.0%