Error when running your current image with host drivers cuda 12.1 : "Could not load dynamic library 'libnvinfer.so.7'"

Question

Error when running your current image with host drivers cuda 12.1 : "Could not load dynamic library 'libnvinfer.so.7'"

deepcoder opened this issue a year ago · comments

When running your current image, the library errors shown bottom occur. However when running the test with 'nvidia-smi', the cuda 11.6.2 drivers seem to operate under the host's cuda 12.1

root@gpu02:~/jupyter# docker run --gpus all nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04 nvidia-smi
==========
== CUDA ==
==========

CUDA Version 11.6.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Mon Jun 19 15:50:29 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 950         Off | 00000000:01:00.0 Off |                  N/A |
| 46%   47C    P0              24W / 125W |      0MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 960         Off | 00000000:02:00.0 Off |                  N/A |
| 17%   32C    P0              25W / 160W |      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
root@gpu02:~/jupyter#

host version info:

root@gpu02:~/jupyter# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
root@gpu02:~/jupyter#

docker run --runtime=nvidia -e TF_MIN_GPU_MULTIPROCESSOR_COUNT=6 -e NVIDIA_VISIBLE_DEVICES=0,1 -p 8848:8888 -it -v $(pwd)/data:/home/jovyan/work -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root cschranz/gpu-jupyter:v1.5_cuda-11.6_ubuntu-20.04_python-only

http://192.168.2.116:8848/lab?token=0b9311c47745956a344958283aaab3960bfff5c06b2ed0e2


[I 2023-06-19 15:46:19.787 ServerApp] Connecting to kernel 4393909e-510c-4a6e-8ad7-a2f8b2be5210.
2023-06-19 15:46:28.602799: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-06-19 15:46:29.301507: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-06-19 15:46:29.301623: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-06-19 15:46:29.301638: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Christoph · Answer 1 · Fri Jul 14 2023 20:50:24 GMT+0800 (China Standard Time)

Hi,
I'm afraid I don't understand the error yet.
The errors and warnings from tensorflow can be ignored (I don't like this behavior).

If there is an error caused by using CUDA 11.6 on 12.1 drivers, try an appropriate base image within src/Dockerfile.header. Make sure the base image has cudnn8 and is with runtime (e.g. 12.1.1-cudnn8-runtime-ubuntu22.04 see the tags).
Using a different base image can be cumbersome, though, as docker-stacks, Tensorflow, PyTorch and some other libraries must support the later version. Usually, it needs quite long - sometimes half a year - until a new driver is supported, see https://github.com/iot-salzburg/gpu-jupyter/tree/master#updates.