A Dockerized environment for Jupyter notebooks and MLflow server, providing an easy and customizable setup. Seamlessly connect Jupyter and MLflow servers to streamline workflows and track experiments.
Setting up the necessary environment for your next project can be a time-consuming and frustrating process. This became clear during my recent MLOps class where many of us struggled to set up Jupyter Notebook and MLflow server environments.
The project allows for easy customization of Jupyter Notebook and MLflow server configurations through Docker Compose, making it easy to switch between different setups.
- Dockerized environment for Jupyter notebooks and MLflow server, allowing for easy setup and deployment.
- Choice of Jupyter Notebook stacks including
jupyter/minimal-notebook
,jupyter/scipy-notebook
, andjupyter/all-spark-notebook
, and more. - MLFlow Configurable Metadata Store: The flexibility to configure metadata (e.g. metrics, parameters) store. You can choose to use a relational database like MySQL, PostgreSQL or SQLite by default. Alternatively, you can configure the system to use Amazon S3 for storing the metadata.
- MLFlowConfigurable Artifact Store: The flexibility to configure artifacts store. By default, the artifacts are stored on the local file system. However, you can configure the system to use a cloud storage as AWS S3, Google Cloud Storage, or Azure.
- Streamlined connection between Jupyter Notebook and MLflow server.
Make sure to Review the Pre-requisites section below before getting started.
This application is shipped with the Docker Compose environment and requires Docker to be installed locally and running. If you're not familiar with Docker or don't have it locally, please reach out to the official website to learn more and follow the Docker installation instructions to install it on your platform:
Docker for Mac
Docker for Linux
Docker for Windows
The Project is containerized within two containers currently build on top of jupyter/scipy-notebook and Python-3.10-slim images both images could be customized via .env
choose the one that fits your needs version
and system distribution
wise, requirements.txt attached and could be customized to add more dependencies.
Make sure to copy the .env.example
file located in the root directory of the project and rename it to .env
before running containers.
Variable | Description | Default Value |
---|---|---|
JUPYTER_BASE_IMAGE |
Jupyter Notebook Base image | jupyter/scipy-notebook |
JUPYTER_BASE_VERSION |
Jupyter Notebook Image Version | latest |
JUPYTER_PORT |
Jupyter Notebook port | 8888 |
JUPYTER_HOST_PORT |
Jupyter Notebook host port | 8899 |
JUPYTER_TOKEN |
Jupyter Notebook default token | jupyter |
Variable | Description | Default Value |
---|---|---|
PYTHON_VERSION |
Python version | 3.10 |
DEBIAN_VERSION |
Debian version | slim-buster |
MLFLOW_VERSION |
MLflow version | 2.3.1 |
MLFLOW_SERVER_PORT |
MLflow server port | 5000 |
MLFLOW_SERVER_HOST_PORT |
MLflow server host port | 5001 |
MLFLOW_BACKEND_STORE |
MLflow backend store | sqlite:////mlflow/mlruns/runs.db |
MLFLOW_ARTIFACT_STORE |
MLflow artifact store | /home/jovyan/artifacts |
MLFLOW_TRACKING_URI |
MLFLOW TRACKING URI | http://mlflow-server:5000 |
Note:
An MLflow tracking server has 3 components for storage: locally and remotlly.
MLFLOW_BACKEND_STORE
: The backend store is where MLflow Tracking Server stores experiment and run metadata (params, metrics, and tags). It could be a file store or database-backed store like MySQL, PostgreSQL or SQLite by default. Stored locally MLflow Server.- example:
sqlite:////mlflow/mlruns.db
orpostgresql://username:password@host:port/database
.
- example:
MLFLOW_ARTIFACT_STORE
: The artifact store is a location suitable for large data (such as an S3 bucket or shared NFS file system) and is where clients log their artifact output (for example, models).- example:
file:///local/path/mlruns
ors3://bucket/path
orazure://bucket/path
orhdfs://namenode/path
orfile:///local/path
- example:
MLFLOW_TRACKING_URI
: Environment variable To log runs same asMLFLOW_BACKEND_STORE
but remotely.- example:
http://localhost:5000
,https://my-tracking-server:5000
ordatabricks://<profileName>
.
- example:
In case you want to run the Jupyter Notebook container only, run the following commands:
To build and run the Jupyter Notebook container:
docker build -t nassarx/mlflow-notebook:1.0 \
-f ./docker/jupyter \
--build-arg JUPYTER_BASE_IMAGE=<image> \
--build-arg JUPYTER_BASE_VERSION=<version> .
docker run \
-p <host_port>:<container_port> \
-e GRANT_SUDO=yes \
-e JUPYTER_TOKEN=<token> \
-v <local_notebooks_dir>:/home/jovyan/work \
-v <local_mlruns_dir>:/home/jovyan/mlruns \
nassarx/mlflow-notebook:1.0
Or simply run the following command to build and run the container from docker-compose:
docker-compose up mlflow-notebook
In case you want to run the MLflow server container only, run the following commands:
To build and run the MLflow server container, run the following commands:
docker build -t nassarx/mlflow-server:1.0 \
-f ./docker/mlflow \
--build-arg PYTHON_VERSION=<version> \
--build-arg DEBIAN_VERSION=<version> \
--build-arg MLFLOW_VERSION=<version> \
--build-arg MLFLOW_SERVER_PORT=<port> .
docker run --name mlflow-server \
-p <host_port>:<container_port> \
-e MLFLOW_BACKEND_STORE=<backend_store> \
-e MLFLOW_TRACKING_URI=<tracking_uri> \
-v <local_mlflow_dir>:/home/jovyan/mlruns \
nassarx/mlflow-server:1.0
Or simply run the following command to build and run the container from docker-compose:
docker-compose up mlflow-server
To build and run both containers on same network, run the following commands:
docker-compose up
Alternatively you can start the application containers in detached mode to suppress containers messages:
docker-compose up --detach
To stop and remove containers, run the following command:
docker-compose down --rmi all
Note: The --rmi all
flag will remove all images associated with the containers. If you want to keep the images, remove the flag.
-
Jupyter Notebook will be listening on ports
8899
on yourlocalhost
, you can access the application main page using your browser using the following URL: http://localhost:8899. or configure your IDE to connect to the notebook server using the following URL: http://localhost:8899. Note: You'll need to provide the token you've set in the.env
file to access the notebook server. -
MLflow server will be listening on ports
5001
on yourlocalhost
, you can access the application main page using your browser using the following URL: http://localhost:5001.
.
├── mlruns
│ ├── 24552641888*****
│ ├── 31988532510*****
│ ├── 61683865883*****
│ ├── etc
├── models
│ ├── 1
│ ├── 2
├── docker
│ ├── jupyter
│ │ └── config
│ │ └── Dockerfile
│ └── mlflow
│ ├── bin
│ └── config
│ └── Dockerfile
└── notebooks
└── docker-compose.yml
└── .env
└── .env.example
└── .gitignore
└── README.md
-
To connect to mlflow server from jupyter notebook running on the same network, use the following code:
mlflow.set_tracking_uri("http://mlflow-server:5000")
Note: The mlflow-server
is the name of the container running the mlflow server.
-
To connect to mlflow server from jupyter notebook running on a different network, use the following code:
mlflow.set_tracking_uri("http://<host_ip>:5001")
Note: The host_ip
is the ip address of the host machine running the mlflow server, could be your local machine (localhost) or a remote server.
As part of ongoing development, we plan to extend the capabilities of the project to make it more versatile and customizable. Specifically, we plan to add the following features:
- Test PostgreSQL and MySQL as a backend store for storing metadata such as metrics, parameters, and tags.
- Configure AWS S3, Google Cloud Storage, or Azure Blob Storage as artifact stores for storing the model artifacts and other output files generated during the experiments.
- Provide an abstract configuration interface that allows users to easily switch between different backend stores and artifact stores based on their needs and preferences.
- Enhance the integration with other popular ML frameworks and libraries beside PyTorch such as TensorFlow to support a wider range of use cases and workflows.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.