BigGIS Spark

Docker container for Apache Spark

Prerequisites

Docker Compose >= 1.9.0

Deployment

On local setup:

docker-compose up -d

On Rancher:

Note: HDFS stack should be deployed and running before Spark stack is deployed.

NFS server and Rancher NFS service need to be configured in the cluster. The NFS volume spark-home need to be created via the Rancher WebUI, which is needed for Apache Zepelin.
Add host label spark-master=true to any of your hosts.
Create new Spark stack spark via Rancher WebUI and deploy docker-compose.rancher.yml.

Submit Spark Job

Build Spark sample job.

cd job/spark-example
mvn clean package

Using Spark Client Image

The image biggis/spark-client:2.1.0 can be used submit Spark jobs to the Spark cluster. Edit the environment variables and volumes in the docker-compose.client.yml according to your setup and specify what spark job (jar and class) to submit. The jar file is mapped as a local volume.

Then run the docker-compose.client.yml file as following.

docker-compose -f docker-compose.client.yml run --rm spark-client

Using Spark REST API

You can also upload the job jar to HDFS and deploy the Spark job via curl.

Example: WordCount hamlet.txt

Upload hamlet.txt from biggis-hdfs repository:

curl -u hdfs:password \
     -F 'file=@data/hamlet.txt' \
     -X POST http://localhost:3000/api/v1/upload/files?hdfspath=/demo/hamlet.txt

Upload packaged spark-example-1.0-SNAPSHOT.jar:

curl -u hdfs:password -F 'job=@job/spark-example/target/spark-example-1.0-SNAPSHOT.jar' http://localhost:3000/api/v1/upload/jobs?hdfspath=/jobs/spark-example

Deploy Spark job:

curl -X POST http://localhost:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data @wordcount-job.json

Ports

Spark WebUI is running on port 8080

About

Docker container for Apache Spark

docker-container apache-spark

MIT License

Languages

Language:Shell 76.9%Language:Dockerfile 11.8%Language:Java 11.2%