Trap syscall with SECCOMP_RET_USER_NOTIF

Question

Trap syscall with SECCOMP_RET_USER_NOTIF

sfc-gh-hyu opened this issue 2 years ago · comments

Description

I am new to gVisor and am thinking about running gVisor on aws ec2 instances. However, it looks like most of the aws ec2 instances does not support nested virtualization, which means i cannot use kvm as the platform, but have to go with ptrace as platform. I am a little worried about the performance penalty. I am wondering if it make sense to the leverage the new seccomp user notification capability to trap syscall and let sentry to do the syscall on behalf of the container process. https://man7.org/linux/man-pages/man2/seccomp_unotify.2.html The linux doc explicitly indicates that this new seccomp notification is designed for letting supervisor process to do the syscall, which seems fitting with the gVisor model. I am hoping this could be faster than ptrace, but I have no idea about the performance impact with this new feature.

Is this feature related to a specific bug?

no

Do you have a specific solution in mind?

Use seccomp user notification to trap syscall.

Kevin Nisbet · Answer 1 · Thu Apr 21 2022 01:04:27 GMT+0800 (China Standard Time)

I can't speak for the authors at all, but I'm sure they've got alot to consider when it comes to looking at additional ways to intercept syscalls. That man page has lots of caveats in it that would need to be carefully considered.

But if I circle to the root of your question, do you actually know your application performs poorly, or is tightly coupled with syscall performance? There are plenty of applications where you could double the time syscalls take and not notice the application perform any differently. Also, if you really need the performance, I believe the metal instances on AWS do allow virtualization, so there is also that option, although I haven't done that myself.

Andrei Vagin · Answer 2 · Thu Apr 21 2022 01:28:43 GMT+0800 (China Standard Time)

The poor performance of the ptrace platform is mostly due to overhead of context switching between sentry and stub processes. SECCOMP_RET_USER_NOTIFY has the same problem. I have the draft patch avagin/linux-task-diag@880f5bd that solve this issue for SECCOMP_NOTIFY.

Haowei Yu · Answer 3 · Thu Apr 21 2022 02:15:17 GMT+0800 (China Standard Time)

@avagin Once your patch is merged into linux upstream, will you consider implementing a platform using SECCOMP_NOTIFY or do you have other concern as well? I certainly have not benchmarked the overhead of the ptrace and have no insight with this. So it could be totally completely non-sense to implement a new platform using SECCOMP_NOTIFY.

@knisbet Thanks for your advice. I have to admit that I did not do any benchmark with my workload, it's still early stage of my investigation. Also if it's not syscall intensive workload, I guess ptrace performance is probably ok. I don't think I want to use a different instance type since I don't think the existing workload is in running in bare metal instance, and it's huge effort if I want to switch to a different instance.

Andrei Vagin · Answer 4 · Wed Nov 30 2022 03:54:12 GMT+0800 (China Standard Time)

https://lore.kernel.org/lkml/20221111073154.784261-1-avagin@google.com/

github-actions · Answer 5 · Wed Sep 13 2023 08:18:49 GMT+0800 (China Standard Time)

A friendly reminder that this issue had no activity for 120 days.

Andrei Vagin · Answer 6 · Wed Sep 13 2023 08:25:51 GMT+0800 (China Standard Time)

@sfc-gh-hyu my changes have been merged to Linus' tree, and a few months ago, we released the systrap platform 1 that uses seccomp to trap system calls.