patriciabonaldy / docker-spark

configura cluster y manager con spark, hadoop

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autor

Patricia Bonaldy

configura cluster y manager con spark, hadoop

versiones

HADOOP_VERSION 3.0.0 SPARK_VERSION 2.4.1 JDK 1.8 PYTHON 2.7

Iniciar docker example

To run SparkPi, run the image with Docker:

docker run --rm -it -p 4040:4040 patybon/spark_docker_pati bin/run-example SparkPi 10

To start spark-shell with your AWS credentials:

docker run --rm -it -e "AWS_ACCESS_KEY_ID=YOURKEY" -e "AWS_SECRET_ACCESS_KEY=YOURSECRET" -p 4040:4040 patybon/spark_docker_pati bin/spark-shell

To do a thing with Pyspark

echo -e "import pyspark\n\nprint(pyspark.SparkContext().parallelize(range(0, 10)).count())" > count.py
docker run --rm -it -p 4040:4040 -v $(pwd)/count.py:/count.py patybon/spark_docker_pati bin/spark-submit /count.py

docker-compose example

To create a simplistic standalone cluster with docker-compose:

docker-compose up

The SparkUI will be running at http://${YOUR_DOCKER_HOST}:8080 with one worker listed. To run pyspark, exec into a container:

docker exec -it docker-spark_master_1 /bin/bash
bin/pyspark

To run SparkPi, exec into a container:

docker exec -it docker-spark_master_1 /bin/bash
bin/run-example SparkPi 10

About

configura cluster y manager con spark, hadoop


Languages

Language:Dockerfile 100.0%