sunsided / spark-atlas

Spark vs. MongoDB Atlas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PySpark + MongoDB + SingleStore

Use Docker Compose to start the setup

docker compose up

This will start a setup of

and

Open JupyterLab here or connect to the Jupyter server at 127.0.0.1:8888 and use the following token:

5f69150501c3c0c4f94f5d4ae38123e2f556777f794bf48b

Use the Aggegation Pipelines notebook as a starting point.

About the Dockerfile

The Dockerfile (as used in docker-compose.yml) provides three different Docker targets, namely master, worker and jupyter. All three targets share the same base images consisting of:

Using the same base image for Jupyter Lab and Spark was the only way to get this setup working; specifically, having only master and worker images and a predefined PySpark image would consistently fail with either JARs not being found or serialization issues happening when running PySpark programs.

About

Spark vs. MongoDB Atlas


Languages

Language:Jupyter Notebook 59.7%Language:Dockerfile 35.5%Language:Shell 4.8%