amcbarnett / AlluxioHiBench

Configure HiBench for Alluxio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AlluxioHiBench

Configure HiBench for Alluxio Environment Setup Link python to python2

sudo ln -s /usr/bin/python /usr/bin/python2

Make the Alluxio client jar available to Yarn:

cp ALLUXIO_HOME_DIR/core/client/target/alluxio-core-client--jar-with-dependencies.jar /hadoop/share/hadoop/yarn/lib

~/vagrant-utils/copy-dir /hadoop/share/hadoop/yarn/lib Installing HiBench HiBench is available through a git repository, clone it like any other repository.

git clone https://github.com/intel-hadoop/HiBench.git

Create a user properties file.

cd conf cp 99-user_defined_properties.conf.template
99-user_defined_properties.conf

Run hibench on Alluxio and hadoop Update the configuration 99-user_defined_properties.conf hibench.hadoop.home HadoopInstallDir Hibench.spark.home SparkInstallDir hibench.hdfs.master alluxio://AlluxioMasterHost:19998 Hibench.scale.profile <tiny,small,large,huge,gigantic, or bigdata>

The profile of scale can be found at https://github.com/intel-hadoop/HiBench/blob/master/conf/10-data-scale-profile.conf ~

Change the languages to only mapreduce for hadoop, or the following for spark: spark/java spark/scala spark/python

cd conf echo “mapreduce” > languages.lst Spark Spark runs on Yarn by default in yarn-client mode. To avoid yarn, change “yarn-client” to “spark://MASTER_HOSTNAME:7077”.

Additional Spark configuration can be specified by adding spark configuration keys to 99-user_defined_properties.conf. Some potentially useful configurations are:

// Make it possible to see spark job information after the job has completed. spark.eventLog.enabled true spark.eventLog.dir /tmp/spark/eventlog // Force Spark to run tasks local to their data spark.locality.wait 100000

If using spark/python, install numpy on each machine, both master and workers:

sudo yum install numpy

Run HiBench Tests. Note the arguments to specify the versions of spark and mapreduce cd src && mvn clean package -D spark1.6 -D MR2 ~/HiBench/bin/run-all.sh

View results in the report folder.

About

Configure HiBench for Alluxio