Docker container for Apache Spark
Docker Compose >= 1.9.0
On local setup:
docker-compose up -d
On Rancher:
Note: HDFS stack should be deployed and running before Spark stack is deployed.
- NFS server and Rancher NFS service need to be configured in the cluster. The NFS volume
spark-home
need to be created via the Rancher WebUI, which is needed for Apache Zepelin. - Add host label
spark-master=true
to any of your hosts. - Create new Spark stack
spark
via Rancher WebUI and deploydocker-compose.rancher.yml
.
Build Spark sample job.
cd job/spark-example
mvn clean package
The image biggis/spark-client:2.1.0
can be used submit Spark jobs to the Spark cluster. Edit the environment variables and volumes in the docker-compose.client.yml
according to your setup and specify what spark job (jar and class) to submit. The jar file is mapped as a local volume.
Then run the docker-compose.client.yml
file as following.
docker-compose -f docker-compose.client.yml run --rm spark-client
You can also upload the job jar to HDFS and deploy the Spark job via curl.
Example: WordCount hamlet.txt
- Upload hamlet.txt from biggis-hdfs repository:
curl -u hdfs:password \
-F 'file=@data/hamlet.txt' \
-X POST http://localhost:3000/api/v1/upload/files?hdfspath=/demo/hamlet.txt
- Upload packaged
spark-example-1.0-SNAPSHOT.jar
:
curl -u hdfs:password -F 'job=@job/spark-example/target/spark-example-1.0-SNAPSHOT.jar' http://localhost:3000/api/v1/upload/jobs?hdfspath=/jobs/spark-example
- Deploy Spark job:
curl -X POST http://localhost:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data @wordcount-job.json
- Spark WebUI is running on port
8080