lucasvmigotto / pucmg-pos-inteligencia-artificial

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

template-datascience

Template for Data Science, Machine Learning and Deep Learning projects

Summary

  1. Start Jupyter with Docker
  2. Python base env to src/ dir
  3. Project organization

Start Jupyter with Docker

Image options

Give it a try with the tag python-3.11 or ubuntu-22.04

  1. Machine Learning

    • Python, R and Julia

      • Pull the image

        docker pull quay.io/jupyter/datascience-notebook
    • Python

      • Pull the image

        docker pull
    • R

      • Pull the image

        docker pull quay.io/jupyter/r-notebook
    • Julia

      • Pull the image

        docker pull quay.io/jupyter/julia-notebook
  2. Deep Learning

    • TensorFlow

      docker pull quay.io/jupyter/tensorflow-notebook
    • PyTorch

      docker pull quay.io/jupyter/pytorch-notebook
  3. PySpark

    • Pull the image

      docker pull quay.io/jupyter/pyspark-notebook

Docker run command base

Run the following command inside your project folder to start the container

docker run \
    --name 'notebook' \
    --hostname 'notebook' \
    --workdir '/app' \
    --volume './:/app:delegated' \
    --network 'bridge' \
    --publish '8888:8888' \
    --rm \
    {{ YOUR IMAGE AND TAG GOES HERE }}

The container will interrupt your terminal. If you want to avoid this, use the -d after the run command to run the container in detach mode. Take notice that, if in detach mode, the token used to connect to the server will not be displayed.

Bash aliases

Given the fact that is a long command to type every time, you maybe would like to create bash aliases to make things easier. The following code demonstrate an example of how to do it:

Depend if you use some other shell (e.g. ZSH), you will need edit the code to redirect the output to the correct rc file

PLEASE PAY ATTENTION, make sure to use >> instead of >. Otherwise, you will overwrite EVERYTHING inside your shell rc file

cat <<EOF >> $HOME/.bashrc
_BASE_CMD="docker run --name 'notebook' --hostname 'notebook' --volume './:/home/jovyan' --network 'bridge' --publish '8888:8888' --env GRANT_USER=yes --rm"
alias docker-notebook-ml="${_BASE_CMD} quay.io/jupyter/datascience-notebook:python-3.11"
alias docker-notebook-dl-tf="${_BASE_CMD} quay.io/jupyter/tensorflow-notebook:python-3.11"
alias docker-notebook-dl-pt="${_BASE_CMD} quay.io/jupyter/pytorch-notebook:python-3.11"
alias docker-notebook-python="${_BASE_CMD} quay.io/jupyter/scipy-notebook:python-3.11"
alias docker-notebook-r="${_BASE_CMD} quay.io/jupyter/r-notebook:python-3.11:python-3.11"
alias docker-notebook-julia="${_BASE_CMD} quay.io/jupyter/julia-notebook:python-3.11"
alias docker-notebook-ps="${_BASE_CMD} quay.io/jupyter/pyspark-notebook:python-3.11"
EOF

Notes

  • If not modified, the default config starts Juyter on 127.0.0.1:8888
  • If you're interested in using VS Code as Jupyter Notebook's editor, make sure to get URL and Token credential in the container logs to successfully connect to the kernel. Look for something like: http://127.0.0.1:8888/lab?token=<TOKEN_HASH>

Python base env to src/ dir

How to use with Docker

  • Starting the project

    docker compose up [-d] app
  • Enter the command line interface with bash inside the container

    docker compose run --rm app

Project organization

    ├── LICENSE
    ├── README.md          <- The top-level README for developers using this project.
    ├── data
    │   ├── external       <- Data from third party sources.
    │   ├── interim        <- Intermediate data that has been transformed.
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
    │
    ├── models             <- Trained and serialized models, model predictions, or model summaries
    │
    ├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
    │                         the creator's initials, and a short `-` delimited description, e.g.
    │                         `1.0-jqp-initial-data-exploration`.
    │
    └── src                <- Source code for use in this project.
        │
        ├── data           <- Scripts to download or generate data
        │   └── make_dataset.py
        │
        ├── features       <- Scripts to turn raw data into features for modeling
        │   └── build_features.py
        │
        ├── models         <- Scripts to train models and then use trained models to make
        │   │                 predictions
        │   ├── predict_model.py
        │   └── train_model.py
        │
        └── visualization  <- Scripts to create exploratory and results oriented visualizations
            └── visualize.py

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Lucas Vidor Migotto - December, 2023

About

License:MIT License


Languages

Language:Jupyter Notebook 100.0%Language:Dockerfile 0.0%