This no longer applies for BentoML releases after 1.0. Mainly for prototyping pre 1.0
- managed their docker images from a entrypoint.sh and a build.py
- drawbacks:
- their buildscripts are mainly
cp
binary built to the model and also include a cuda-enabled images as their base layer - not elegant and hard to maintain for developers
- their buildscripts are mainly
- dependent on the image having
python
built in
after packing, edit Dockerfile as follows for GPU-supports:
FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu16.04 as nvidia-cuda
...
COPY --from=nvidia-cuda /usr/local/cuda-11.0 /usr/local/cuda
COPY --from=nvidia-cuda /usr/lib/x86_64-linux-gnu/libcudnn* /usr/local/cuda/lib64/
# apparently tensorflow need this linked in order to use GPU
RUN ln /usr/local/cuda/lib64/libcusolver.so.10 /usr/local/cuda/lib64/libcusolver.so.11
ENV PATH=/usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64
...
relevant files can be found under onnx
In order to run inference with GPU, users must use onnxruntime-gpu
as the library will automatically allocate GPU
resources to run inference, fallback to CPU if needed.
User can check if they have GPU support with get_providers()
:
...
# for ONNXModelArtifacts, session=self.artifacts.model
cuda = "CUDA" in session.get_providers()[0] # True
relevant files can be found under pytorch
import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
relevant files can be found under tf
import tensorflow as tf
gpu = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu[0], True)
# or using `with` statement:
with tf.device("/GPU:0"):
...
- errors when running
yatai-start
- How do people usually use GPU in the wild?
- wtf is kubeflow?
setting up minikube https://minikube.sigs.k8s.io/docs/tutorials/nvidia_gpu/
- driver has to be manually installed, referred to here
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
-
check
kubeadm
init -
check
kubelet
systemd init -
minikube start --driver=kvm2 --kvm-gpu --docker-opt {all the device opts}
-
in order to run with GPU we need to do PCI passthrough and this requires an unbound GPU. I might have to set this up later since my current GPU is bounded to Xorg session.
-
configuring pods to consume GPUs
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
containers:
- name: my-gpu-container
image: nvidia/cuda:10.0-runtime-ubuntu18.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 2
web ui -> choose a higher level api -> if not then we structure way to use lower api
overwrite -> proposal for docker images
recent updates from systemd re-architecture broke nvidia-docker
, refers to #1447. This issue is confirmed to be in the patched for future releases.
current workaround:
# for debian users one can disable cgroup hierarchy by adding to GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0"
# for arch users change #no-cgroups=true under /etc/nvidia-container-runtime/config.toml
# one can just run the below command
docker run --gpus all --device /dev/nvidia0 --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidia-modeset --device /dev/nvidiactl ...
# or setup a docker-compose.yml and do
devices:
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-modeset:/dev/nvidia-modeset
- /dev/nvidia-uvm:/dev/nvidia-uvm
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools