Sparkle: Apache Spark applications in Haskell
Sparkle [spär′kəl]: a library for writing resilient analytics applications in Haskell that scale to thousands of nodes, using Spark and the rest of the Apache ecosystem under the hood.
This is an early tech preview, not production ready.
Getting started
The tl;dr using the hello
app as an example on your local machine:
$ stack build hello
$ mvn -f sparkle -Dsparkle.app=sparkle-example-hello package
$ spark-submit --master 'local[1]' sparkle/target/sparkle-0.1.jar
Requirements:
- the Stack build tool;
- either, the Nix package manager,
- or, OpenJDK, Maven and Spark >= 1.6 installed from your distro.
To run a Spark application the process is as follows:
- create an application in the
apps/
folder, in-repo or as a submodule; - add your app to
stack.yaml
; - build the app;
- package your app into a deployable JAR container;
- submit it to a local or cluster deployment of Spark.
If you run into issues, read the Troubleshooting section below first.
To build:
$ stack [--nix] build
You can optionally pass --nix
to all Stack commands to ask Nix to
provision a local Spark and Maven in a local sandbox for good build
results reproducibility. Otherwise you'll need these installed through
your OS distribution's package manager for the next steps (and you'll
need to tell Stack how to find the JVM header files and shared
libraries).
To package your app (omit the square bracket part entirely if you're
not using --nix
):
$ [stack --nix exec --] \
mvn -f sparkle -Dsparkle.app=<app-executable-name> package
Finally, to run your application, for example locally:
$ [stack --nix exec --] \
spark-submit --master 'local[1]' sparkle/target/sparkle-0.1.jar
See here for other options, including lauching a whole cluster from scratch on EC2.
Troubleshooting
jvm
library or header files not found
You'll need to tell Stack where to find your local JVM installation.
Something like the following in your ~/.stack/config.yaml
should do
the trick, but check that the paths match up what's on your system:
extra-include-dirs: [/usr/lib/jvm/java-7-openjdk-amd64/include]
extra-lib-dirs: [/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server]
Or use --nix
: since it won't use your globally installed JDK, it
will have no trouble finding its own locally installed one.
License
Copyright (c) 2015-2016 Tweag I/O Limited.
All rights reserved.
Sparkle is free software, and may be redistributed under the terms specified in the LICENSE file.
About
Sparkle is maintained by Tweag I/O.
Have questions? Need help? Tweet at @tweagio.