google / gvisor

Application Kernel for Containers

Home Page:https://gvisor.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

apt-get in newer ubuntu versions segfaults

invliD opened this issue · comments

Description

This is very similar to #7341. I originally believed bash and apt-get crashing on startup had the same underlying cause, but I was wrong. #7389 fixed the issue with bash, but apt-get still crashes.

Just like #7341, this appears to only happen with AMD Ryzen CPUs (works fine on Intel), and only with the kvm platform (ptrace works fine).

Steps to reproduce

Start a ubuntu:jammy container with runsc. Attach to it, and run apt-get. It may segfault. If it doesn't, run it again until it does. In the log I'm attaching it segfaulted 5 out of 7 tries.

runsc version

runsc version VERSION_MISSING
spec: 1.0.2-dev

This is obviously not very helpful. I compiled my own binary based on v0.0.0-20220422053245-c992cd46cc7e from the go branch due to #7327.

docker version (if using docker)

I am using containerd:

containerd github.com/containerd/containerd 1.5.5-0ubuntu3~20.04.2

uname

Linux <redacted> 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:12:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.14", GitCommit:"57a3aa3f13699cf3db9c52d228c18db94fa81876", GitTreeState:"clean", BuildDate:"2021-12-15T14:47:10Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

NAME       STATUS     ROLES    AGE     VERSION
<redacted> Ready      <none>   134d    v1.20.2

runsc debug logs (if available)

runsc.log.20220422-181819.749948.boot.log

D0422 18:18:36.579598       1 task_run.go:294] [   9:   9] Unhandled user fault: addr=55b4cbd78700 ip=7f96955a0f27 access=-w- sig=11 err=bad address
D0422 18:18:36.579704       1 task_log.go:87] [   9:   9] Registers:
D0422 18:18:36.579719       1 task_log.go:94] [   9:   9] Cs       = 0000000000000033
D0422 18:18:36.579726       1 task_log.go:94] [   9:   9] Ds       = 0000000000000000
D0422 18:18:36.579731       1 task_log.go:94] [   9:   9] Eflags   = 0000000000011203
D0422 18:18:36.579736       1 task_log.go:94] [   9:   9] Es       = 0000000000000000
D0422 18:18:36.579741       1 task_log.go:94] [   9:   9] Fs       = 0000000000000000
D0422 18:18:36.579746       1 task_log.go:94] [   9:   9] Fs_base  = 00007f969563e800
D0422 18:18:36.579751       1 task_log.go:94] [   9:   9] Gs       = 0000000000000000
D0422 18:18:36.579756       1 task_log.go:94] [   9:   9] Gs_base  = 0000000000000000
D0422 18:18:36.579760       1 task_log.go:94] [   9:   9] Orig_rax = 000055b4cbd52836
D0422 18:18:36.579765       1 task_log.go:94] [   9:   9] R10      = fffffffffffffff7
D0422 18:18:36.579770       1 task_log.go:94] [   9:   9] R11      = 0000000000000061
D0422 18:18:36.579775       1 task_log.go:94] [   9:   9] R12      = 0000000000000156
D0422 18:18:36.579781       1 task_log.go:94] [   9:   9] R13      = 000055b4cbd301c6
D0422 18:18:36.579786       1 task_log.go:94] [   9:   9] R14      = 0000000000000001
D0422 18:18:36.579791       1 task_log.go:94] [   9:   9] R15      = 0000000000000000
D0422 18:18:36.579796       1 task_log.go:94] [   9:   9] R8       = ffffffffffffffe0
D0422 18:18:36.579812       1 task_log.go:94] [   9:   9] R9       = 0000000000000000
D0422 18:18:36.579817       1 task_log.go:94] [   9:   9] Rax      = 000055b4cbd52836
D0422 18:18:36.579824       1 task_log.go:94] [   9:   9] Rbp      = 00007f969561a780
D0422 18:18:36.579829       1 task_log.go:94] [   9:   9] Rbx      = 0000000000000156
D0422 18:18:36.579834       1 task_log.go:94] [   9:   9] Rcx      = 0000000000000020
D0422 18:18:36.579838       1 task_log.go:94] [   9:   9] Rdi      = 000055b4cbd76700
D0422 18:18:36.579843       1 task_log.go:94] [   9:   9] Rdx      = 0000000000000136
D0422 18:18:36.579849       1 task_log.go:94] [   9:   9] Rip      = 00007f96955a0f27
D0422 18:18:36.579853       1 task_log.go:94] [   9:   9] Rsi      = 000055b4cbd54110
D0422 18:18:36.579858       1 task_log.go:94] [   9:   9] Rsp      = 00007fa6cb6081c8
D0422 18:18:36.579863       1 task_log.go:94] [   9:   9] Ss       = 000000000000002b

7:  c5 fd e7 a7 00 10 00    vmovntdq YMMWORD PTR [rdi+0x1000],ymm4
e:  00
f:  c5 fd e7 af 20 10 00    vmovntdq YMMWORD PTR [rdi+0x1020],ymm5
16: 00
17: c5 fd e7 b7 40 10 00    vmovntdq YMMWORD PTR [rdi+0x1040],ymm6
1e: 00
1f: c5 fd e7 bf 60 10 00    vmovntdq YMMWORD PTR [rdi+0x1060],ymm7
26: 00
27: c5 7d e7 87 00 20 00    vmovntdq YMMWORD PTR [rdi+0x2000],ymm8
2e: 00
2f: c5 7d e7 8f 20 20 00    vmovntdq YMMWORD PTR [rdi+0x2020],ymm9
36: 00
37: c5 7d e7 97 40 20 00    vmovntdq YMMWORD PTR [rdi+0x2040],ymm10

@invliD could you try to reproduce this issue with the following patch?

diff --git a/runsc/boot/loader.go b/runsc/boot/loader.go
index 1d5918a55..f91684bc6 100644
--- a/runsc/boot/loader.go
+++ b/runsc/boot/loader.go
@@ -355,7 +355,7 @@ func New(args Args) (*Loader, error) {
        // Initiate the Kernel object, which is required by the Context passed
        // to createVFS in order to mount (among other things) procfs.
        if err = k.Init(kernel.InitKernelArgs{
-               FeatureSet:                  cpuid.HostFeatureSet().Fixed(),
+               FeatureSet:                  cpuid.HostFeatureSet(),
                Timekeeper:                  tk,
                RootUserNamespace:           creds.UserNamespace,
                RootNetworkNamespace:        netns,

That patch doesn't appear to change anything for apt-get, except I'm back to even bash segfaulting (see #7341).

Here's a log of a run of sh using ubuntu:jammy, exec'ing into the container with another
sh, and running bash and then apt-get, both crashing:
runsc.log.20220426-195323.478653.boot.log

That patch doesn't appear to change anything for apt-get, except I'm back to even bash segfaulting (see #7341).

This is really strange... On my amd hosts, it works. I was able to reproduce #7341, but I can't reproduce this issue.

I found that new glibc-s use more cpuid extended functions. Could you try out avagin@e35c8d9? If the issue is still reproduced with this fix, I will need logs.

With your new patch bash no longer crashes, but apt-get still does. Here's the log of a run of ubuntu:jammy with sh, exec'ing another sh, running bash (works) and then apt-get from there. Segfault.
runsc.log.20220430-002317.594408.boot.txt

@invliD could you show /proc/cpuinfo from the host and a container?

Here's the first of 32 cores from as seen by the host:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 33
model name	: AMD Ryzen 9 5950X 16-Core Processor
stepping	: 0
microcode	: 0xa201009
cpu MHz		: 2195.427
cache size	: 512 KB
physical id	: 0
siblings	: 32
core id		: 0
cpu cores	: 16
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 6787.02
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

and from a runsc container (with your last patch):

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 175
model		: 33
model name	: unknown
stepping	: unknown
cpu MHz		: 2194.952
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clwb sha_ni xsaveopt xsavec xgetbv1 xsaves umip pku ospke vaes vpclmulqdq rdpid
bogomips	: 2194.95
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
  1. Could you try to update your ubuntu image?
    In the log, apt-get reports 2.4.1.

I0430 00:23:23.395590 1 strace.go:593] [ 7: 7] apt-get E write(0x1 host:[5], 0x5563fa1056e0 "apt 2.4.1 (amd64)\n", 0x12)

The current version is 2.4.5:

$ docker run -it --rm ubuntu:jammy apt-get | head -n 1
apt 2.4.5 (amd64)
  1. I added more debug in runsc. The binary is here: https://github.com/avagin/gvisor/actions/runs/2260779574. Could you install it and show output of "/lib64/ld-linux-x86-64.so.2 --list-diagnostics" from a native container and a gvisor container. And I will need runsc logs.

@invliD friendly ping

@avagin

$ docker run -it --rm --runtime=runsc ubuntu:jammy bash -c 'apt-get | head -1'
apt 2.4.6 (amd64)
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 33
model name      : AMD Ryzen 9 5900X 12-Core Processor
stepping        : 0
microcode       : 0xa201016
cpu MHz         : 2200.000
cache size      : 512 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 7386.08
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

@pkit Could you give me ssh access to this host for a day or two?

That's my workstation, I'm not sure I can do the passthrough. But will check it out.

@avagin Pls send me your ssh public key, will setup a bouncer.

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCW0qQXF7qUOfmP0z3Ny6AlEPMjOoYd/aflfYXP6b8WAIv+YmwGEo0lCvXDuzG5i92YmuWgNXuxisLMnOzZu6WqBhRRGqiZFSFHCqvWLo1bNWAPpQC3eZTkGu3DbgV6V2hmH8V7zcuq12efrrv+YiVjFKVIiGEYV/QBtLbSugMZtUiDyfD76zoiKHgX0p30TkpuJEdsRTFC6QuXYSPINAADSR/+u6y8NxpYZ6zumUaoI6ugeMkVXqf/OELZP4aN6Lq/q9Fg28OJootSVI5+Q8vHb2fVcd2k4KH8RLDHJWuJN2RkozwkzlnmRDysbLA1WpF2rwk9+k5pJ8SmAcdO6f7n5sk+wndF1rSE72shJno/h+ZEZ+W8CcMc4L+w9XHKnj+UKJfZKS2xC5naXetV8y5WrKGjB5yoCIa7+r0Bfw3mWxkNxa33gmpP0+nzpEb9aENZVUrPKBIH9U8y77OAZ+Bd92zTArdqxcHXQndep/3Ea1nx+aM0nUGlD+dFQ5PGvw8= avagin@avagin

ssh to avagin@44.202.166.200 from there you can ssh -p 9001 localhost to the actual box via a permanent tunnel.

I am able to login to the test host. Thanks a lot!

Ok, I will need to add you to the sudo group, because there is no way to test runsc without root, it seems :(

@pkit How do you reproduce the issue?

there is no way to test runsc without root, it seems :(

We have the rootless mode:

$ runsc --rootless --platform kvm --network host do something

but the current user has to be in the kvm group to use /dev/kvm.

I tried to run apt-get but it didn't crash:

$ docker run -it --rm --privileged --device /dev/kvm -v /usr/local/bin/:/usr/local/bin/ ubuntu:jammy \
    runsc  -TESTONLY-unsafe-nonroot --platform kvm --ignore-cgroups --rootless --network host do \
    bash -c 'for i in `seq 100`; do apt-get; ret=$?; if [ $ret -ne 1 ]; then echo $ret; break; fi; done'

Yup, that's what I ve said :)
It works on the same stepping as the initial bug reporter.
So I suppose it can be closed?

I thought you want to test something else

Ah. I misunderstood you. Thanks again for your cooperation.