apptainer / apptainer

Apptainer: Application containers for Linux

Home Page:https://apptainer.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--nvccli fails for NVIDIA_DRIVER_CAPABILITIES=graphics

xqms opened this issue · comments

Hi,

after fixing #2028 I noticed another issue with --nvccli: Setting NVIDIA_DRIVER_CAPABILITIES=graphics does not work (and in fact fails earlier than #2028, so it's a separate issue).

I traced this to a problematic interaction with fuse-overlayfs, but I'm not sure if this is a fuse-overlayfs bug.

nvidia-container-cli will try to create the directory /etc/nvidia/nvidia-application-profiles-rc.d with mode 0555 (code here). Creating such a directory in fuse-overlayfs will fail as illustrated in this minimal example:

#!/bin/bash
mkdir lower upper work final
fuse-overlayfs -o "lowerdir=$(pwd)/lower,upperdir=$(pwd)/upper,workdir=$(pwd)/work" final
mkdir -m 0555 final/test || echo "mkdir failed!"
fusermount -u final

which gives

mkdir: cannot create directory ‘final/test’: Permission denied
mkdir failed!

The reason it fails is that fuse-overlayfs will create a "backing file" in its work directory with the specified mode (0555) (code). It will then attempt to set some extended attributes on this file, which fails because it is not writable. This can be seen in this strace of fuse-overlayfs:

write(2, "ovl_mkdir(ino=1, name=test, mode"..., 38ovl_mkdir(ino=1, name=test, mode=365)
) = 38
fstatfs(4, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=245715567, f_bfree=20763722, f_bavail=8263639, f_files=62480384, f_ffree=56904629, f_fsid={val=[0x5ab33290, 0xff1c871c]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
mkdirat(5, "2", 0555)                   = 0
openat2(5, "2", {flags=O_RDONLY, resolve=RESOLVE_IN_ROOT}, 24) = 9
fsetxattr(9, "trusted.overlay.opaque", "y", 1, 0) = -1 EPERM (Operation not permitted)
fsetxattr(9, "user.fuseoverlayfs.opaque", "y", 1, 0) = -1 EACCES (Permission denied)
unlinkat(5, "2", AT_REMOVEDIR)          = 0

I see three possible fixes:

  1. Use the undocumented -o xattr_permissions=2 parameter of fuse-overlayfs, which seems to create work/upper files always with the same mode and instead saves permissions as extended attributes. No idea if this has any security/portability implications.
  2. File a bug with nvidia-container-cli maintainers and ask if they can do mkdir with 0755 mode and later chmod to 0555 - this works correctly
  3. Ugly workaround: Create the /etc/nvidia/nvidia-application-profiles-rc.d directory ourselves (with mode 0755) prior to calling nvidia-container-cli.

Version of Apptainer

apptainer version 1.3.0-rc.2

Expected behavior

The container starts properly.

Actual behavior

$ NVIDIA_DRIVER_CAPABILITIES=graphics apptainer run --nv --nvccli docker://ubuntu:22.04
INFO:    Using cached SIF image
INFO:    Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.
INFO:    Setting --writable-tmpfs (required by nvidia-container-cli)
INFO:    Cleanup error: while stopping driver for /var/lib/apptainer/mnt/session/final: fuse-overlayfs exited
FATAL:   container creation failed: nvidia-container-cli failed with exit status 1: nvidia-container-cli: mount error: file creation failed: /var/lib/apptainer/mnt/session/final/etc/nvidia/nvidia-application-profiles-rc.d: permission denied

Steps to reproduce this behavior

see above.

What OS/distro are you running

$ cat /etc/os-release
apptainer version 1.3.0-rc.2

How did you install Apptainer

From GitHub release page.

I think this is related to the second bullet in the Requirements & limitations documentation on --nvccli.

I think this is related to the second bullet in the Requirements & limitations documentation on --nvccli.

I actually don't think so. I'm not using a setuid install of apptainer and it's automatically using --writable-tmpfs.

It even creates the directory /etc/nvidia without problems, and then fails while creating the subdirectory /etc/nvidia/nvidia-application-profiles-rc.d. So the problem is not that the container directory is not writable, and it's also not that nvidia-container-cli does not have sufficient permissions. It's just fuse-overlayfs not being able to create the directory with the specified permissions (see my minimal shell script example).

The more I think about this, the more I feel this is an actual fuse-overlayfs bug. I will open an issue there as well. But nevertheless I think apptainer should somehow handle/workaround this problem.

Ok I think you may be right. I can reproduce with your fuse-overlayfs script, even with /usr/libexec/apptainer/bin/fuse-overlayfs which is the latest fuse-overlayfs version. On the other hand with a root-mapped unprivileged user namespace, that is, unshare -rm, if I do a similar mount with -t overlay instead then the mkdir -m 0555 succeeds. Oh, and if I run fuse-overlayfs in that same user namespace it also works, because the fake root running there is able to override permissions on files that I own. So that makes me wonder again if this is the root cause of the nvidia-container-cli problem because fuse-overlayfs and nvidia-container-cli ought to be running in a root-mapped unprivileged user namespace.

Yes, thank you for #2036. I remembered later that fuse-overlayfs, and probably nvidia-container-cli (although I haven't yet verified it) are not run with a root-mapped user namespace, they instead are run using elevated capabilities that were available via ambientcaps.

For the record, in suid mode the image driver does not elevate capabilities for the FUSE programs, because there they are run outside of a user namespace after the /dev/fuse mount is done by root. So that mode would probably also have a problem with creating a mode 555 directory through fuse-overlayfs. Apptainer doesn't support --nvccli in suid mode anyway, so it's not going to run into this problem.