google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trap syscall with SECCOMP_RET_USER_NOTIF

sfc-gh-hyu opened this issue · comments

Description

I am new to gVisor and am thinking about running gVisor on aws ec2 instances. However, it looks like most of the aws ec2 instances does not support nested virtualization, which means i cannot use kvm as the platform, but have to go with ptrace as platform. I am a little worried about the performance penalty. I am wondering if it make sense to the leverage the new seccomp user notification capability to trap syscall and let sentry to do the syscall on behalf of the container process. https://man7.org/linux/man-pages/man2/seccomp_unotify.2.html The linux doc explicitly indicates that this new seccomp notification is designed for letting supervisor process to do the syscall, which seems fitting with the gVisor model. I am hoping this could be faster than ptrace, but I have no idea about the performance impact with this new feature.

Is this feature related to a specific bug?

no

Do you have a specific solution in mind?

Use seccomp user notification to trap syscall.

I can't speak for the authors at all, but I'm sure they've got alot to consider when it comes to looking at additional ways to intercept syscalls. That man page has lots of caveats in it that would need to be carefully considered.

But if I circle to the root of your question, do you actually know your application performs poorly, or is tightly coupled with syscall performance? There are plenty of applications where you could double the time syscalls take and not notice the application perform any differently. Also, if you really need the performance, I believe the metal instances on AWS do allow virtualization, so there is also that option, although I haven't done that myself.

The poor performance of the ptrace platform is mostly due to overhead of context switching between sentry and stub processes. SECCOMP_RET_USER_NOTIFY has the same problem. I have the draft patch avagin/linux-task-diag@880f5bd that solve this issue for SECCOMP_NOTIFY.

@avagin Once your patch is merged into linux upstream, will you consider implementing a platform using SECCOMP_NOTIFY or do you have other concern as well? I certainly have not benchmarked the overhead of the ptrace and have no insight with this. So it could be totally completely non-sense to implement a new platform using SECCOMP_NOTIFY.

@knisbet Thanks for your advice. I have to admit that I did not do any benchmark with my workload, it's still early stage of my investigation. Also if it's not syscall intensive workload, I guess ptrace performance is probably ok. I don't think I want to use a different instance type since I don't think the existing workload is in running in bare metal instance, and it's huge effort if I want to switch to a different instance.

A friendly reminder that this issue had no activity for 120 days.

@sfc-gh-hyu my changes have been merged to Linus' tree, and a few months ago, we released the systrap platform 1 that uses seccomp to trap system calls.