AbhishekRS4 / docker_pytorch

Docker container for PyTorch CUDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docker container for PyTorch training

Info

  • This project can be used for building docker container and training models using PyTorch within the container
  • The base docker container used in this project can be found here

Building the container

  • Add any additional python or system dependencies to the Dockerfile
  • Use the following command to build the container
docker build -t my_pytorch .

Resolving Error

  • If the error is the following docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]., then follow the instructions to install additional nvidia-container-toolkit
  • The instructions can be found in the following nvidia-container-toolkit instructions website
    1. Configure the repo, run the following command
      curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    
    1. Update the packages list from the repo, run the following command
    sudo apt-get update
    
    1. Install the nvidia-container-toolkit, run the following command
    sudo apt-get install -y nvidia-container-toolkit
    
    1. Configure the runtime container, run the following command
    sudo nvidia-ctk runtime configure --runtime=docker
    
    1. Restart the docker daemon
    sudo systemctl restart docker
    

Running training with PyTorch

  • Copy the directory containing the dataset files into the project directory so that it can be directly mounted onto the docker container
  • The following example command shows how to run the training
docker run --rm -it --init   --gpus=all   --ipc=host   --user="$(id -u):$(id -g)"   --volume="$PWD:/app"   my_pytorch python3 modeling/train.py --dir_dataset /app/dir_dataset/
  • A directory can be mounted with the option --volume where the $PWD on host is mounted to /app on the container
  • In the above example my_pytorch is the name of the docker container, dir_dataset is the directory containing the dataset files, in the same directory as the project directory i.e. $PWD on the host

About

Docker container for PyTorch CUDA


Languages

Language:Dockerfile 100.0%