This project aims to deploy a Language Model (LLM) using Docker with NVIDIA GPU support. Below are the steps to set up and deploy the model.
-
Grant Docker Access to NVIDIA GPU:
- Follow the provided link to install necessary components for Docker to utilize the system's NVIDIA GPU. This step is crucial for enabling GPU acceleration within the Docker environment.
- Link: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
-
Configure Model:
- Open
constants.py
and change the model name to specify the desired LLM to run. This allows flexibility in choosing different models for deployment.
- Open
-
Set Hugging Face Access Token:
- Create a
.env
file and set your Hugging Face access token. This token is required for accessing models from the Hugging Face Model Hub.
- Create a
-
Build Docker Image:
- Run the command
docker build -t <image_name> .
to build the Docker image. This image will contain all necessary dependencies and configurations for deploying the LLM.
- Run the command
-
Run Docker Container:
- Launch the Docker container with NVIDIA runtime support using the command:
docker run -p 8000:8000 --runtime=nvidia --gpus all <image_name>
- Replace
<image_name>
with the name of the Docker image built in the previous step. - This command exposes port 8000 to interact with the deployed LLM.
- Launch the Docker container with NVIDIA runtime support using the command:
- Ensure Docker is properly installed on your system before proceeding.
- Make sure NVIDIA GPU drivers are installed and up-to-date.
- Verify that the Hugging Face access token is valid and correctly set in the
.env
file. - Run nvidia-smi to check whether GPU is being used or not