google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support sandbox network mode in rootless

sfc-gh-cxie opened this issue · comments

Description

I'm not sure if it's WAI, is it possible to do runsc run in rootless with sandbox network mode?

Based on my understanding, sandbox network mode is securer but root mode is less securer, is it contradictory? Thanks

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

commented

#311 answers your second question.

The --rootless option skips a few security layers that can be setup by root only.

#311 answers your second question.

The --rootless option skips a few security layers that can be setup by root only.

Thanks Jing, for some context, I wanna run some arbitrary code inside gVisor, if we run it in root mode and arbitrary code break gVisor, it means it will have the root access, that's why we want to run it in rootless mode.

I wonder what would be the suggested way here, should I use host network mode here(which is less secure from the network perspective), or should I try to work on supporting sandbox mode with rootless.

if we run it in root mode and arbitrary code break gVisor, it means it will have the root access

gVisor re-executes itself as an unprivileged user before starting the container. Therefore, if the code manages to break gVisor, it will not have root access.

When running with sudo, gVisor only temporarily uses the root privileges in order to create the namespaces and other layers of security (seccomp rules, pivot_root, etc) prior to entering the sandboxed mode (dropping all its privileges and changing its own user to unprivileged at that point) and starting the container code. I suggest reading the architecture docs and security model page for more info.

if we run it in root mode and arbitrary code break gVisor, it means it will have the root access

gVisor re-executes itself as an unprivileged user before starting the container. Therefore, if the code manages to break gVisor, it will not have root access.

When running with sudo, gVisor only temporarily uses the root privileges in order to create the namespaces and other layers of security (seccomp rules, pivot_root, etc) prior to entering the sandboxed mode (dropping all its privileges and changing its own user to unprivileged at that point) and starting the container code. I suggest reading the architecture docs and security model page for more info.

Thanks Etienne, I'm reading the code but seem gVisor only re-execute itself in rootless mode?(

log.Infof("*** Re-running as root in new user namespace ***")
). But per what you've said, it seems we should always run gVisor in root mode, with more features and extra secure check?

Hm, this link goes to a private repository? Edit: link fixed

The re-execution I am referring to is here, which sets the UID here (and drops all capabilities on the next line).

it seems we should always run gVisor in root mode, with more features and extra secure check?

Yes. Generally speaking, gVisor is intended to take care of doing whatever is the most secure thing to do given whatever privileges it has.

Thanks for updating the link! Yes, the code you've linked to re-runs itself as "root" within a new namespace; this is different from "root" in the initial user namespace, and does not grant the process any additional privilege within the initial user namespace (which is where "root" access would be sensitive).

Moreover, this code executes as part of the runsc run command, which does not run any container code. The process that runs the actual untrusted container is a subprocess of that: it's the runsc boot command, which is the one that goes through the codepath with callSelfAsNobody.

That makes sense, thanks. One more clarification on

gVisor re-executes itself as an unprivileged user before starting the container. Therefore, if the code manages to break gVisor, it will not have root access.

I'm thinking about the case that attacker would escape the container and get the original root privileges. Or are we saying since we've drop all capabilities, even attacker manages escaping from the container, they are unable to get the root privileges either. But should we still have the concern about the file access?

Or are we saying since we've drop all capabilities, even attacker manages escaping from the container, they are unable to get the root privileges either.

Correct. If the code inside the container manages to get the privileges of the runsc boot process it runs in, all it can do is whatever permissions the runsc boot process can do. But that process has already dropped all of its privileges before running anything inside the container.

But should we still have the concern about the file access?

Before running any container code, runsc boot also runs in a separate mount namespace, and calls pivot_root on itself to remove its own access to any host file that isn't a container root filesystem file or mounted volume.

Additionally, if you use --directfs=false, the runsc boot process's seccomp rules will forbid use of the open and openat syscall, i.e. it will not be able to open any host file at all.