pprof runs out of memory while reading cpuprof data
GoogleCodeExporter opened this issue · comments
What steps will reproduce the problem?
1. a very long running trace from cpuprof data (300 MB or more)
2. use pprof to generate a report from it
3. pprof runs out of memory in ReadCPUProfile() in 32-bit Perl
What is the expected output? What do you see instead?
pprof should not try to read the entire file at once and unpack it in
memory, this is very inefficient and capacity-limiting
What version of the product are you using? On what operating system?
0.93 on Linux (Red Hat Enterprise 3)
Please provide any additional information below.
I modified pprof's ReadCPUProfile subroutine to use a "sliding window"
approach to reading the file. Here is the patch (against pprof 0.93 since I
still haven't upgraded to 1.3 due to tcmalloc performance degradation) I
would like to see this bugfix integrated into a future release of pprof.
Thank you
Index: pprof
===================================================================
RCS file: scripts/pprof,v
retrieving revision 1.3
retrieving revision 1.4
diff -b -p -u -r1.3 -r1.4
--- pprof 10 Jun 2009 22:35:14 -0000 1.3
+++ pprof 11 Aug 2009 20:16:54 -0000 1.4
@@ -2054,15 +2054,70 @@ sub ReadProfile {
}
}
+
+# Don't read everything at once, it'll OOM easily.
+# Stream in chunks along a sliding window with small lookback overlap.
+BEGIN {
+ package CPStream;
+
+ sub new {
+ my ($class, $file) = @_;
+ my $self = { file => $file,
+ base => 0,
+ stride => 524288,
+ overlap => 1024,
+ slots => [] };
+ bless $self, $class;
+ $self->overflow();
+ return $self;
+ }
+
+ sub overflow {
+ my ($self) = @_;
+ my $slots = $self->{slots};
+ my $slotsize = $#$slots + 1;
+ my $jumpsize = $slotsize - $self->{overlap};
+ if ($jumpsize > 0) {
+ splice(@$slots, 0, $jumpsize);
+ $self->{base} += $jumpsize;
+ }
+ my $str;
+ my $nbytes = read($self->{file}, $str, $self->{stride});
+ if ($nbytes > 0) {
+ push(@$slots, unpack("L*", $str));
+ }
+ }
+
+ sub get {
+ my ($self, $idx) = @_;
+ my $slots = $self->{slots};
+ while ($#$slots >= 0) {
+ my $cur_min = $self->{base};
+ last if ($idx < $cur_min);
+
+ my $cur_max = $cur_min + $#$slots;
+ if ($idx <= $cur_max) {
+ my $offset = $idx - $cur_min;
+ return $slots->[$offset];
+ }
+ $self->overflow();
+ }
+ return 0;
+ }
+}
+
+
# CPU profile reader
sub ReadCPUProfile {
my $prog = shift;
my $fname = shift;
- # Read entire profile into a string
- my $str;
- my $nbytes = read(PROFILE, $str, (stat PROFILE)[7]); # read entire file
- close(PROFILE);
+ # Read entire profile into a string --> NO, BAD IDEA!!
+ #my $str;
+ #my $nbytes = read(PROFILE, $str, (stat PROFILE)[7]); # read entire file
+ #close(PROFILE);
my $version;
my $period;
@@ -2075,7 +2130,9 @@ sub ReadCPUProfile {
#1) a valid 64-bit profile to use the 32-bit codepath, and
#2) a valid 32-bit profile to be unrecognized.
- my @slots = unpack("L*", $str);
+ #my @slots = unpack("L*", $str);
+
+ my $slots = CPStream->new(*PROFILE);
# Read header. The current header version is a 5-element structure
# containing:
@@ -2093,21 +2150,21 @@ sub ReadCPUProfile {
# 32-bit: 0 3 0 100 0
# 64-bit: 0 0 3 0 0 0 100 0 0 0
#
- if ($#slots < 4 || $slots[0] != 0 ) {
+ if ($slots->get(0) != 0 ) {
error("$fname: not a profile file, or old format profile file\n");
}
- if ($slots[1] >= 3) {
+ if ($slots->get(1) >= 3) {
# Normal 32-bit header:
- $version = $slots[2];
- $period = $slots[3];
- $i = 2 + $slots[1];
+ $version = $slots->get(2);
+ $period = $slots->get(3);
+ $i = 2 + $slots->get(1);
$address_length = 8;
# Parse profile
- while ($i <= $#slots) {
- my $n = $slots[$i++];
- my $d = $slots[$i++];
- if ($slots[$i] == 0) {
+ while ($slots->get($i) != 0) {
+ my $n = $slots->get($i++);
+ my $d = $slots->get($i++);
+ if ($slots->get($i) == 0) {
# End of profile data marker
$i += $d;
last;
@@ -2116,7 +2173,7 @@ sub ReadCPUProfile {
# Make key out of the stack entries
my @k = ();
for (my $j = 0; $j < $d; $j++) {
- my $pc = sprintf("%08x", $slots[$i+$j]);
+ my $pc = sprintf("%08x", $slots->get($i+$j));
$pcs->{$pc} = 1;
push @k, $pc;
}
@@ -2128,28 +2185,31 @@ sub ReadCPUProfile {
# Normal 64-bit header: All entries are doubled in size. The first
# word (little-endian) should contain the real value, the second should
# be zero.
- } elsif ($#slots < 9 || $slots[1] != 0 || $slots[2] < 3 || $slots[3] != 0
- || $slots[5] != 0 || $slots[7] != 0) {
+ } elsif ($slots->get(1) != 0 ||
+ $slots->get(2) < 3 ||
+ $slots->get(3) != 0 ||
+ $slots->get(5) != 0 ||
+ $slots->get(7) != 0) {
error("$fname: not a profile file, or old format profile file\n");
} else {
- $version = $slots[4];
- $period = $slots[6];
- $i = 4 + 2 * $slots[2];
+ $version = $slots->get(4);
+ $period = $slots->get(6);
+ $i = 4 + 2 * $slots->get(2);
$address_length = 16;
# Parse profile
- while ($i <= $#slots) {
- my $n = $slots[$i++];
- my $nhi = $slots[$i++];
+ while ($slots->get($i) != 0) {
+ my $n = $slots->get($i++);
+ my $nhi = $slots->get($i++);
# Huge counts may coerce to floating point, keeping scale, not precision
if ($nhi != 0) { $n += $nhi*(2**32); }
- my $d = $slots[$i++];
- if ($slots[$i++] != 0) {
+ my $d = $slots->get($i++);
+ if ($slots->get($i++) != 0) {
my $addr = sprintf("%o", 4 * $i);
print STDERR "At index $i ($addr):\n";
error("$fname: stack trace depth >= 2**32\n");
}
- if ($slots[$i] == 0 && $slots[$i+1] == 0) {
+ if ($slots->get($i) == 0 && $slots->get($i+1) == 0) {
# End of profile data marker
$i += 2 * $d;
last;
@@ -2158,8 +2218,8 @@ sub ReadCPUProfile {
# Make key out of the stack entries
my @k = ();
for (my $j = $d; $j--; ) {
- my $pclo = $slots[$i++];
- my $pchi = $slots[$i++];
+ my $pclo = $slots->get($i++);
+ my $pchi = $slots->get($i++);
my $pc = sprintf("%08x%08x", $pchi, $pclo);
$pcs->{$pc} = 1;
push @k, $pc;
@@ -2169,7 +2229,12 @@ sub ReadCPUProfile {
}
# Parse map
- my $map = substr($str, $i * 4);
+ #my $map = substr($str, $i * 4);
+
+ my $map = '';
+ seek(PROFILE, $i * 4, 0);
+ read(PROFILE, $map, (stat PROFILE)[7]);
+ close(PROFILE);
my $r = {};
$r->{version} = $version;
Original issue reported on code.google.com by vvv...@gmail.com
on 28 Aug 2009 at 7:02
0.93 is quite old. Can you try again with perftools 1.3 -- I think the file
format
hasn't changed since 0.93, so just using the newer pprof should probably
suffice --
and see if you have the same problem? I must say this report surprises me. We
use
this functionality a lot within google, on very large binaries with lots of CPU
activity, and have never had a problem running out of space. How big is your
.profile? Do you have a simple use-case -- binary plus command-line to run it
--
that you could share that illustrates the problem?
The fix looks pretty straightforward, and I'm not opposed to applying it, but I
want
to understand first whether this is just a symptom of some other, underlying
problem.
Original comment by csilv...@gmail.com
on 31 Aug 2009 at 5:35
Also, in case this patch does end up being the right solution, can you
(electronically) sign the CLA at
http://code.google.com/legal/individual-cla-v1.0.html
Thanks!
craig
Original comment by csilv...@gmail.com
on 31 Aug 2009 at 5:36
- Changed state: Accepted
- Added labels: Priority-Medium, Type-Defect
Yes I had the same problem as of 1.2. I took a glance at 1.3 and the code is the
same. I use pprof with just the binary and the profile, no fancy options. The
fundamental problem is this:
my $nbytes = read(PROFILE, $str, (stat PROFILE)[7]); # read entire file
...
my @slots = unpack("L*", $str);
OOM happened when the file size went beyond 300-400 MB. For me that came from a
trace
of 4-5 hours of a particular software. 32-bit Perl went OOM after 3.5+ GB.
64-bit
Perl did better but it needed 30-40 GB total and nearly 30 minutes just to get
pack
the unpack() step.
So my patch is to address both the memory & runtime bloat.
Thanks for considering it.
Original comment by vvv...@gmail.com
on 3 Sep 2009 at 12:25
OK, I'm glad to know it's still in the current versions of pprof.
I understad that the problems you are seeing are because the profile file is
really
big. But what I don't understand is why it's so big. While I'm not the most
expert
at this part of the system, it seems to me these profiles should typically be
small,
even for executables that run for a long time. That's our experience, at least.
Do you have a simple example you can show, that generates such large profile
files?
That would help us understand if there's some other problem causing too-large
profile
files or not. Then we can know if this patch is the best fix, or something
else.
Original comment by csilv...@gmail.com
on 3 Sep 2009 at 1:50
In the meantime, I've tried applying the patch, and are asking other folks here
what
they think about it, so if it is the best solution, we'll be ready to go with
it.
Original comment by csilv...@gmail.com
on 3 Sep 2009 at 3:51
- Changed state: Started
I think the reason for my very large profile file is because of the deep
stack-trace
in my software (at least 20-30 functions deep, up to 100-120). My application
has a
built-in Tcl interpreter and many commands are extended with C++ underneath.
The C++
code can be very deep sometimes. Each Tcl command nesting can introduce several
function layers. So that all adds up to quite a bit of stacking. Huge credit
goes to
tcmalloc and pprof for handling such a large software without hiccup! except
this one :-)
Original comment by vvv...@gmail.com
on 4 Sep 2009 at 3:12
We ended up getting rid of the look-behind, which isn't really needed, and made
a few
other tweaks, but this basic functionality is in perftools 1.4, just released.
Thanks for the patch!
Original comment by csilv...@gmail.com
on 11 Sep 2009 at 6:59
- Changed state: Fixed