dtrace4linux / linux

dtrace for linux - kernel driver and userland tools

Home Page:http://crtags.blogspot.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Entire system freezes on Google Compute Engine VM

jacobsa opened this issue · comments

I'm trying to get dtrace working on a Linux VM running on Google Cloud Engine. I can successfully load the kernel module, but the moment I run dtrace my SSH session stops responding (and eventually fails with "Broken pipe"). I can't SSH in again until I reboot the VM.

For example, I tried to do this as a hello world:

% sudo dtrace -n 'syscall::rmdir:entry { @num[pid,execname] = count(); }'
dtrace: description 'syscall::rmdir:entry ' matched 2 probes
Write failed: Broken pipe
ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited with return code [255].

I don't know the first thing about dtrace, so maybe I'm doing something dumb here. Is this expected?

Here's my version:

% uname -a                                                          
Linux ubuntu 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

GCE is a kvm instance, I recall testing dtrace in a kvm instance on my machine
and it caused repeatable kernel panics but I didn't investigate it much further.
According to Paul it's something about linux behaving different because it is in
a VM but I was under the impression KVM was a full hardware virtualization so it
should not even really be aware that it is virtualized...

On 06/05/2015 01:35 PM, Aaron Jacobs wrote:

I'm trying to get dtrace working on a Linux VM running on Google Cloud Engine. I can successfully load the kernel module, but the moment I run dtrace my SSH session stops responding (and eventually fails with "Broken pipe"). I can't SSH in again until I reboot the VM.

For example, I tried to do this as a hello world:

% sudo dtrace -n 'syscall::rmdir:entry { @num[pid,execname] = count(); }'
dtrace: description 'syscall::rmdir:entry ' matched 2 probes
Write failed: Broken pipe
ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited with return code [255].

I don't know the first thing about dtrace, so maybe I'm doing something dumb here. Is this expected?

Here's my version:

% uname -a                                                          
Linux ubuntu 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Reply to this email directly or view it on GitHub:
#114

Satire is the escape hatch from the cycle of sorrow, hatred and violence. #JeSuisCharlie

its possibly its Xen causing the issue. Best thing to try is on an Ubuntu
kvm running on top of a Xen host (I think Ubuntu defaults to this).

When a panic/freeze occurs, likely the kernel is double-faulted - it can be
helpful to try some other syscall you can control (like chdir) and tail -f
/var/log/kernel.log in the background or run on the VM's console. The panic
messages will hit the console, but if you are in an X session or solely
ssh'd in you will never see the console.

If you are lucky, after a reboot /var/log/kern.log will have the trail end
of the death of the VM before the reboot.

Be helpful to see /proc/dtrace/trace before doing any dtrace activity - it
might hilite what is missing in the kernel.

On 5 June 2015 at 13:02, Caleb James DeLisle notifications@github.com
wrote:

GCE is a kvm instance, I recall testing dtrace in a kvm instance on my
machine
and it caused repeatable kernel panics but I didn't investigate it much
further.
According to Paul it's something about linux behaving different because it
is in
a VM but I was under the impression KVM was a full hardware virtualization
so it
should not even really be aware that it is virtualized...

On 06/05/2015 01:35 PM, Aaron Jacobs wrote:

I'm trying to get dtrace working on a Linux VM running on Google Cloud
Engine. I can successfully load the kernel module, but the moment I run
dtrace my SSH session stops responding (and eventually fails with "Broken
pipe"). I can't SSH in again until I reboot the VM.

For example, I tried to do this as a hello world:

% sudo dtrace -n 'syscall::rmdir:entry { @num[pid,execname] = count(); }'
dtrace: description 'syscall::rmdir:entry ' matched 2 probes
Write failed: Broken pipe
ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited with return code
[255].

I don't know the first thing about dtrace, so maybe I'm doing something
dumb here. Is this expected?

Here's my version:

% uname -a
Linux ubuntu 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux


Reply to this email directly or view it on GitHub:
#114

Satire is the escape hatch from the cycle of sorrow, hatred and violence.
#JeSuisCharlie


Reply to this email directly or view it on GitHub
#114 (comment).

Thanks for the tips. I ran this:

sudo dtrace -n 'syscall::chdir:entry { @num[pid,execname] = count(); }'

and the SSH session on which I ran it immediately froze up, with dtrace not responding to Ctrl-C, and eventually timed out. Other sessions continued to work, but the system would no longer respond to new SSH connections. I got this NMI watchdog message in /var/log/kern.log.