lobantseff / torch-distributed-training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine learning project template

This template was prepared to facilitate the routine of Docker image preparation for a typical deep learning project. Core idea of this template is usability. You need to do just a few steps and you are ready for running your experiments!

If you need to share the container you can share this folder evolved and Docker tarred image (docker save my-image:latest > my-image.tar), then your counterpart can easily run it with bash docker_start.(sh | ps1) and, voila!, they get the same enviroment as you!

This version of the template is based on NVIDIA PyTorch Release 22.03

Requirements:

Linux:

Windows:

Build image

  1. Add proxy setting in the ~/.docker/config.json:

     {
         "proxies": {
             "default": {
                 "httpProxy": "http://address.proxy.com:8888/",
                 "httpsProxy": "http://address.proxy.com:8888/"
             }
         }
     }
    
  2. Rename ./src/ml_project_template into a custom name. After build you can import this module in python. You can add as many modules in ./src as you want before the build. Do not forget, that each module should include __init__.py to be taken into account.

  3. Add pip install arguments into requirements.txt. The file will be used with a command: xargs -L 1 pip install --no-cache-dir < requirements.txt. This means that each line will be executed as pip install <line in requirements.txt>

  4. Add Pyton-installable libs into ./libs. They will be installed during the build with pip install -e <lib> and can be imported in python directly.

  5. Build image

  • In Linux shell or WSL: bash docker_build.sh
  • Follow prompts. Workspace dir is a directory on the host machine. Provide the full path, please.

Start container

  • In Linux shell or WSL: bash docker_start.sh

  • Follow prompts. You will be asked to define IMAGE_NAME, CONTAINER_NAME, JUPYTER_PORT, TENSORBOARD_PORT, SSH_PORT. The ports you are asked to set-up are the host ports, advice available ports to your system admin if you work on remote server, or specify free ports if you work on local machine.

    • Jupyter Lab is available at: http://localhost:<JUPYTER_PORT>/lab
    • Jupyter Notebook is available at: http://localhost:<JUPYTER_PORT>/tree
    • Tensorboard is available at: http://localhost:<TENSORBOARD_PORT>, monitoring experiments in $tb.
    • Connect to container via SSH: ssh -p <SSH_PORT> root@localhost (if you are under proxy, no connection to outer world => no package installation possible)
    • Inspect the container: docker exec -it <CONTAINER_NAME> bash (if you are under proxy, install packages inside in this mode)
    • Stop the container: docker stop <CONTAINER_HASH>
    • Inside the container $ws will be available at /ws
  • If you want to define additional docker run parameters, just provide them after the command.
    For example: bash docker_start.sh -p 9898:9898

Connect IDE to the running container:

Update image

  • You can access container by the comand: docker exec -it <CONTAINER_NAME> bash
  • Then install as many pip packages as you want (do not forget to add them into requirements.txt)
  • At the end you can update the image with the command: docker commit --change=CMD ~/init.sh <CONTAINER_NAME> <IMAGE_NAME>

Share image

  • Share the repo and then either build the image on new machine, or compress and decompress the image on a new machine :
  • docker save <IMAGE_NAME>:latest > my-image.tar
  • docker load < my-image.tar

Notes:

  • On Windows machine. If PowerShell says "execution of scripts is disabled on this system", you can run in powershell with admin rights: Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachine to allow scripts execution. But do it with caution, since some scripts can be vulnerable. For the details follow the link.
  • You can attach VSCode to a running container: quick tutorial, documentation
  • To commit updates from a running container to the built image use:
    docker commit --change='CMD ~/init.sh' updated_container_name_or_hash docker_image_name
    (not recommended as a daily practice, good practice is to update the environment.yaml, requirements.txt or Dockerfile)

Project structure and philosophy behind

The idea behind this template is to be able to store lightweight code and heavy model artifacts and data in different places.

  # Code folder. Available under `/code` inside the container
  template-ml-project/
  ├── libs/
  |   ├── external_lib_as_submodule1/
  |   └── external_lib_as_submodule1/
  ├── src/
  │   ├── custom_module1/
  │   │   └── __init__.py
  |   └── custom_module2/
  |       └── __init__.py
  ├── notebooks
  │   └── jupyter_notebook_example.ipynb
  ├── .gitignore
  ├── Dockerfile
  ├── README.md
  ├── docker_build.ps1
  ├── docker_build.sh
  ├── docker_run.ps1
  ├── docker_run.sh
  ├── environment.yaml
  ├── requirements.txt
  ├── set_jupyter_password.py
  └── setup.py
  
  # Workspace folder. Available under `/ws` inside the container.
  template-ml-project-workspace/
  ├── data_raw
      └── file.dcm
  ├── data_processed
      └── file.npz
  ├── artifacts
      ├── segmentation_masks
      |   └── mask.jpg
      └── checkpoints
  ├── configs/
  └── etc/

About


Languages

Language:Python 44.0%Language:Shell 40.3%Language:Dockerfile 11.1%Language:Jupyter Notebook 4.6%