dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docker fails to create container after upgrading docker on Jetpack 4.9

LukaTri opened this issue · comments

I upgraded docker using sudo apt-get update/upgrade, and now when I try to run the nvcr.io/nvidia/l4t-ml:r32.6.1-py3 container, I get this error message:

`docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall

clone3: permission denied: unknown.`

If I have to downgrade docker to the previous version, how do I do that? And if not, what can I do to fix this error?

JetPack 4.6 on Jetson Nano - I'm getting the same error during building of containers (setup for runtime = nvidia) which were fine previously. Have issues with all nvidia containers which includes: nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3, nvcr.io/nvidia/l4t-base:r32.6.1 & dustynv/ros:noetic-ros-base-l4t-r32.6.1

Is this related perhaps? NVIDIA/nvidia-container-runtime#157

Yes, that seems to be the same issue. I will look through that thread and update any solitons/updates here. Thanks.

I ran into this today as well (what a bad time to update), and spent a couple of hours fiddling with things to attempt to fix it.

Downgrading is probably the easiest approach at the moment. The newest version of the nvidia-docker-toolkit that fixes this problem is currently in experimental, and packages haven't even been built for arm64 yet (I let them know on that linked issue). You can download a .deb file of the last version of Docker 19 at https://launchpad.net/ubuntu/bionic/arm64/docker.io/19.03.6-0ubuntu1~18.04.3 .

You'll probably also have to downgrade containerd to 1.5.2 by doing

apt install containerd=1.5.2-0ubuntu1~18.04.3

You may also want to pin docker.io to version 19 and containerd to 1.5.2 for now so it doesn't get updated again until they sort things out (or until Jetpack 5.0 releases next year sometime and we get a slightly less crusty version of Ubuntu). You can do that by editing /etc/apt/preferences and adding:

Package: docker.io
Pin: version 19.03*
Pin-Priority: 1001

Package: containerd
Pin: version 1.5.2*
Pin-Priority: 1001

Thank you, that worked perfectly. I did just as you said, I downloaded the docker.io v19... in the link you provided, then I ran the downgrade for containerd. I also did create a file in /etc/apt/preferences to ignore docker and containerd updates for now.

I had an error when I tried to downgrade to Docker 19 before I downgraded containerd, so you should downgrade containerd first,

$ sudo apt install containerd=1.5.2-0ubuntu1~18.04.3

and then Docker 19.

$ sudo apt install ./docker.io_19.03.6-0ubuntu1_18.04.3_arm64.deb

Also, you can fix the version of docker.io and containerd in the following way, without editting /etc/apt/preferences

$ sudo apt-mark hold docker.io containerd

If the problem is solved and you want to upgrade them,

$ sudo apt-mark unhold docker.io containerd

Do we know if an update has been published to solve this problem? If not, do we know when we can expect it to be published?

@JeremieBourque1 I am unaware of any updates that have fixed this issue. I still have used this thread's solution as:

  1. Downgrading containerd to containerd=1.5.2-0ubuntu1~18.04.3
  2. Downloading the docker.io package, then running this command: ./docker.io_19.03.6-0ubuntu1_18.04.3_arm64.deb
  3. Marking the docker.io package and containerd package as hold by doing: sudo apt-mark hold docker.io containerd

If you do find that an update has been released that addresses this issue, please post it here!

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

@dusty-nv It works for me, thank you!

This worked for the Nanosaur Installation (nanosaur.ai) and cleared the "docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown.error" for ":~nanosaur run"

However, please encourage the effort to get current Docker releases to work with JP4.6 on Jetson Nano 4 & 2G so folks don’t have to do a patch

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

It worked perfectly.

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

Perfect! This came right on time for me.

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

It works !

$ printenv | grep JETPACK
JETSON_JETPACK=4.6
$ sudo docker run --rm --runtime nvidia -it nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
root@afb2a19c93bc:/# python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

it works! thank you.

Works perfectly.
Thank u!

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

Facing issue and the above instructions does not solve the problem.
Environment: Xavier NX

$printenv | grep JETPACK
JETSON_JETPACK=4.6
$python3
Python 3.6.9 (default, Mar 15 2022, 13:55:28) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
\>\>\> import torch
\>\>\> torch.cuda.is_available()
True
\>\>\> 
$ sudo docker run --rm --runtime nvidia -it nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3

root@18d14c453369:/# python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
\> \> \> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

Hi @iamjagan, I believe this is either an unrelated error, or that you are missing CUDA toolkit, or missing the CSV files under /etc/nvidia-container-runtime/host-files-for-container.d/

Hey there, I think I'm experiencing a related issue. When trying to execute my code within the container on Jetson Nano, I'm faced with this error:

Traceback (most recent call last): File "main.py", line 2, in <module> import consumer File "/jetson-inference/Keeper/consumer.py", line 15, in <module> from MotionOrchestrator import MotionOrchestrator File "/jetson-inference/Keeper/MotionOrchestrator.py", line 34, in <module> import jetson.inference File "/usr/lib/python3.7/dist-packages/jetson/inference/__init__.py", line 8, in <module> from jetson_inference_python import * ImportError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.7: file too short

I tried the fix @dusty-nv mentioned above, but that didn\t seem to have an effect. This code used to execute without any issues until I tried deploying it again yesterday which is when I noticed this issue pop up.

This code used to execute without any issues until I tried deploying it again yesterday which is when I noticed this issue pop up.

Hi @Jean-Lytehouse, had you upgraded your JetPack-L4T version in the meantime? Are you able to run python3 -c 'import tensorrt' from within the container?

If you do an ls -ll /usr/lib/aarch64-linux-gnu/libnvinfer.so.7 from inside & outside the container, are the files there?

Hey @dusty-nv! Thanks for the response!

had you upgraded your JetPack-L4T version in the meantime?

Sorry, how do I go about doing this?

Are you able to run python3 -c 'import tensorrt' from within the container?

I get a ModuleNotFoundError: No module named tensorrt

If you do an ls -ll /usr/lib/aarch64-linux-gnu/libnvinfer.so.7 from inside & outside the container, are the files there?

Outside the container - no, inside the container, yes, but interestingly the file it is symlinked to has a size of 0.

Outside the container - no, inside the container, yes, but interestingly the file it is symlinked to has a size of 0.

OK, it would seem like you are missing the TensorRT packages on your device. Interesting that this was working for you before. Which version of JetPack-L4T are you running? (cat /etc/nv_tegra_release)

You can try sudo apt-get install nvidia-tensorrt tensorrt python3-libnvinfer-dev

OK, it would seem like you are missing the TensorRT packages on your device. Interesting that this was working for you before. Which version of JetPack-L4T are you running? (cat /etc/nv_tegra_release)

My growing suspicion is that our Jetson Nano vendor (Aaeon) may have changed the image. It's seeming like JetPack might not be installed, the above command didn't yield anything hinting towards a JetPack version. I also tried running sudo apt-cache show nvidia-jetpack, which yielded a unable to locate package nvidia-jetpack, I then tried sudo apt install nvidia-jetpack which yielded the same thing.

You can try sudo apt-get install nvidia-tensorrt tensorrt python3-libnvinfer-dev

Outside the container, this yielded a unable to locate package nvidia-tensorrt, inside the container it yielded unable to locate package nvidia-tensorrt, unable to locate package tensorrt, unable to locate package python3-libnvinfer-dev

It's seeming like JetPack might not be installed, the above command didn't yield anything hinting towards a JetPack version.

Was it missing /etc/nv_tegra_release file, or what did it show? It should contain the L4T version (which corresponds to a JetPack version)

What is strange is the code used to work for you, but I don't understand how if libnvinfer.so is missing outside of the container. This gets mounted into the container by the nvidia docker runtime when --runtime nvidia is used

It's seeming like JetPack might not be installed, the above command didn't yield anything hinting towards a JetPack version.

Was it missing /etc/nv_tegra_release file, or what did it show? It should contain the L4T version (which corresponds to a JetPack version)

What is strange is the code used to work for you, but I don't understand how if libnvinfer.so is missing outside of the container. This gets mounted into the container by the nvidia docker runtime when --runtime nvidia is used

It says
# R32 (releaase), REVISION: 6.1, GCID: 2783751, BOARD: t210ref, EABI: aarch64 , DATE, Mon Jul 26 2021

I should have clarified on what I meant when I said it was working before. We've got multiple Nano deployments, several of them in the wild. In the past our deployment process was to pull our containers onto Nanos we got from our supplier, and it was good to go. We got this batch of Nanos from our supplier, and now the code doesn't run. So to be clear, this code has not run before on this specific Nano.

Based on this conversation, it seems like Jetpack may not have come installed on these devices? Or am I barking up the wrong tree?

I hope that helps, and thanks so much for your help!

Based on this conversation, it seems like Jetpack may not have come installed on these devices? Or am I barking up the wrong tree?

That would seem to be the case - you have L4T R32.6.1 (which corresponds to JetPack 4.6), but seemingly not the JetPack components like CUDA Toolkit, ect.

You can try adding this file to your apt sources and then running sudo apt-get update:

$ cat /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
deb https://repo.download.nvidia.com/jetson/common r32.6 main
deb https://repo.download.nvidia.com/jetson/t194 r32.6 main

Then you should be able to do things like sudo apt-get install libnvinfer-dev nvidia-container-csv-tensorrt
Installing those two packages should hopefully bring in the dependencies needed for using TensorRT inside the containers.

Based on this conversation, it seems like Jetpack may not have come installed on these devices? Or am I barking up the wrong tree?

That would seem to be the case - you have L4T R32.6.1 (which corresponds to JetPack 4.6), but seemingly not the JetPack components like CUDA Toolkit, ect.

You can try adding this file to your apt sources and then running sudo apt-get update:

$ cat /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
deb https://repo.download.nvidia.com/jetson/common r32.6 main
deb https://repo.download.nvidia.com/jetson/t194 r32.6 main

Then you should be able to do things like sudo apt-get install libnvinfer-dev nvidia-container-csv-tensorrt Installing those two packages should hopefully bring in the dependencies needed for using TensorRT inside the containers.

I think we're getting close! I saw in that file those 2 lines were commented out for some reason. I uncommented and ran the suggested commands. I now have libnvinfer.so and libnvifer.so.8 outside of the container, but no libnvinfer.so.7, which is what the code complains about when I try run it from within the container:

ImportError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.7: file too short

Guessing there's a version mismatch going on somewhere?

I now have libnvinfer.so and libnvifer.so.8 outside of the container, but no libnvinfer.so.7, which is what the code complains about when I try run it from within the container:

ImportError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.7: file too short

Are you running an r32.6.1 container? libnvinfer.so.7 would be from an older container I think
Do you have libnvinfer.so.8 inside the container?

Are you able to import tensorrt inside an r32.6.1 container?

$ sudo docker run -it --rm --net=host --runtime nvidia nvcr.io/nvidia/l4t-base:r32.6.1
# python3 -c 'import tensorrt'

OSError: libcurand.so.10: cannot open shared object file: No such file or directory

Hi @iamjagan, I believe this is either an unrelated error, or that you are missing CUDA toolkit, or missing the CSV files under /etc/nvidia-container-runtime/host-files-for-container.d/

This is not an unrelated issue. I started getting this error as soon as I used your solution to get the docker working. I'm definitely not missing cuda toolkit in the host machine but I'm in the docker container. So please let me know what CSV files are we meant to have under /etc/nvidia-container-runtime/host-files-for-container.d/

So please let me know what CSV files are we meant to have under /etc/nvidia-container-runtime/host-files-for-container.d/

$ ls /etc/nvidia-container-runtime/host-files-for-container.d/
cuda.csv  cudnn.csv  l4t.csv  tensorrt.csv  visionworks.csv

These are the related apt packages that you could try removing/purging and re-installing:

$ apt-cache search nvidia-container-*
libnvidia-container-tools - NVIDIA container runtime library (command-line tools)
libnvidia-container0 - NVIDIA container runtime library
libnvidia-container1 - NVIDIA container runtime library
nvidia-container-csv-cuda - Jetpack CUDA CSV file
nvidia-container-csv-cudnn - Jetpack CUDNN CSV file
nvidia-container-csv-tensorrt - Jetpack TensorRT CSV file
nvidia-container-csv-visionworks - Jetpack VisionWorks CSV file
nvidia-container-runtime - NVIDIA container runtime
nvidia-container-toolkit - NVIDIA container runtime hook
nvidia-container - NVIDIA Container Meta Package

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

Unable to locate package nvidia-docker2

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

Hi, novice here
I am trying to run the "Getting started with AI on Jetson Nano" course, currently stuck on the "Headless Device Mode" section. I have Jetpack 4.4.1 installed and run the following with appropriate container tag v2.0.1-r32.4.4

# create a reusable script
echo "sudo docker run --runtime nvidia -it --rm --network host \
    --volume ~/nvdli-data:/nvdli-nano/data \
    --volume /tmp/argus_socket:/tmp/argus_socket \
    --device /dev/video0 \
    nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.4.4" > docker_dli_run.sh

# make the script executable
chmod +x docker_dli_run.sh

# run the script
./docker_dli_run.sh

I get the same error

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall

clone3: permission denied: unknown

Will your fix work in my case?

Will your fix work in my case?

I believe that it should - this is the same error. My guess is that at some point you had run apt-get upgrade and it upgraded these docker packages.

We have released a fix for this, here are the steps to run it:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install nvidia-docker2=2.8.0-1

Let me know if that works for you guys.

Hi, novice here I am trying to run the "Getting started with AI on Jetson Nano" course, currently stuck on the "Headless Device Mode" section. I have Jetpack 4.4.1 installed and run the following with appropriate container tag v2.0.1-r32.4.4

# create a reusable script
echo "sudo docker run --runtime nvidia -it --rm --network host \
    --volume ~/nvdli-data:/nvdli-nano/data \
    --volume /tmp/argus_socket:/tmp/argus_socket \
    --device /dev/video0 \
    nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.4.4" > docker_dli_run.sh

# make the script executable
chmod +x docker_dli_run.sh

# run the script
./docker_dli_run.sh

I get the same error

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall

clone3: permission denied: unknown

Will your fix work in my case?

I was able to resolve that error using this https://forums.developer.nvidia.com/t/docker-isnt-working-after-apt-upgrade/195213/7
I hope it helps you too.