KeystoneML

The biggest, baddest pipelines around.

Example pipeline

Build the KeystoneML project

./sbt/sbt assembly
make # This builds the native libraries used in KeystoneML

Example: MNIST pipeline

# Get the data from S3
wget http://mnist-data.s3.amazonaws.com/train-mnist-dense-with-labels.data
wget http://mnist-data.s3.amazonaws.com/test-mnist-dense-with-labels.data

KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
  pipelines.images.mnist.MnistRandomFFT \
  --trainLocation ./train-mnist-dense-with-labels.data \
  --testLocation ./test-mnist-dense-with-labels.data \
  --numFFTs 4 \
  --blockSize 2048

Running with spark-submit

To run KeystoneML pipelines on large datasets you will need a Spark cluster. KeystoneML pipelines run on the cluster using spark-submit.

You need to export SPARK_HOME to run KeystoneML using spark-submit. Having done that you can similarly use run-pipeline.sh to launch your pipeline.

export SPARK_HOME=~/spark-1.3.1-bin-cdh4 # should match the version keystone is built with
KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
  pipelines.images.mnist.MnistRandomFFT \
  --trainLocation ./train-mnist-dense-with-labels.data \
  --testLocation ./test-mnist-dense-with-labels.data \
  --numFFTs 4 \
  --blockSize 2048

About

Simplifying robust end-to-end machine learning on Apache Spark.

http://keystone-ml.org/

Apache License 2.0

Languages

Language:Scala 95.9%Language:C++ 2.4%Language:Shell 1.0%Language:Makefile 0.4%Language:R 0.2%Language:Python 0.1%