Request for new "declfile" source type
ertucci opened this issue · comments
I’m trying to use bloaty to divide up memory usage of a project by directory structure. The compileunits source type seemed to be the closest fit to what I was looking for, so I used it as the base type for my custom directory structure data type. However, I noticed that constants inlined within header files weren’t placed into a compileunit named with the path to file, which was the case for most symbols. These were instead rolled up in a generic [section .rodata] compileunit. I’ve tried to reproduce a simple version of this behavior using the source code at the bottom and compiling using gcc and clang.
The symbol, sample_array, should be 40 bytes in size, defined in array.h, and used in translation_unit1.cc and translation_unit2.cc. However, bloaty analysis of the clang executable puts it under both main.cc with strange size attribution because of additional inclusion in the .eh_frame_hdr section.
$ bloaty bloat_header_inline_example_clang.elf -d sections,compileunits,symbols --domain=vm --source-filter=sample_array
VM SIZE
--------------
76.9% 40 .rodata
100.0% 40 main.cc
100.0% 40 sample_array
23.1% 12 .eh_frame_hdr
100.0% 12 main.cc
100.0% 12 sample_array
100.0% 52 TOTAL
For gcc, the compileunit name is not intuitive and therefore it is difficult to attribute symbols defined in header inlines to the appropriate source code.
$ bloaty bloat_header_inline_example_gcc.elf -d sections,compileunits,symbols --domain=vm --source-filter=sample_array
VM SIZE
--------------
100.0% 40 .rodata
100.0% 40 QUIET_NAN__ 1
100.0% 40 sample_array
100.0% 40 TOTAL
Looking at the dwarfdump of both executables, it looks like the information I want is present under the DW_AT_decl_file tag.
COMPILE_UNIT<header overall offset = 0x000000b1>:
< 0><0x0000000c> DW_TAG_compile_unit
DW_AT_producer (indexed string: 0x00000000)Fuchsia clang version 15.0.0 (https://llvm.googlesource.com/a/llvm-project 3a20597776a5d2920e511d81653b4d2b6ca0c855)
DW_AT_language DW_LANG_C_plus_plus_14
DW_AT_name (indexed string: 0x00000001)translation_unit2.cc
DW_AT_str_offsets_base 0x00000058
DW_AT_stmt_list 0x000000f0
DW_AT_comp_dir (indexed string: 0x00000002)
DW_AT_low_pc (addr_index: 0x00000001)0x00001880
DW_AT_high_pc <offset-from-lowpc> 56 <highpc: 0x000018b8>
DW_AT_addr_base 0x00000030
LOCAL_SYMBOLS:
< 1><0x00000035> DW_TAG_variable
DW_AT_name (indexed string: 0x00000005)sample_array
DW_AT_type <0x00000040>
DW_AT_external yes(1)
DW_AT_decl_file 0x00000001 /array.h
DW_AT_decl_line 0x00000003
DW_AT_location len 0x0002: 0xa100:
DW_OP_addrx 0
Is it possible to create a different source type which is primarily based on the decl file (as this is essentially what I want and compileunits was just as close as I could get)?
``
main.h
#pragma once
int main();
main.cc
#include "translation_unit1.h"
#include "translation_unit2.h"
int main() {
int i = 0;
while(1){
increment_i_1(&i);
increment_i_2(&i);
}
return 0;
}
array.h
#pragma once
inline constexpr int sample_array[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
translation_unit1.h
#pragma once
void increment_i_1(int* i);
translation_unit1.cc
#include "translation_unit1.h"
#include "array.h"
const int i_1 = 2;
void increment_i_1(int* i) {
*i += sample_array[*i % 10];
*i += i_1;
}
translation_unit2.h
#pragma once
void increment_i_2(int* i);
translation_unit2.cc
#include "translation_unit2.h"
#include "array.h"
const int i_2 = 3;
void increment_i_2(int* i) {
*i += sample_array[*i % 10];
*i += i_2;
}
Thanks for the detailed report!
I think there are two key questions in here:
- Why is the existing
compileunits
data source failing to attributesample_array
? - Would it be possible to add a
declfile
data source that follows source files instead of compileunits?
Both are good questions.
Debugging the bug
Looking at the dwarfdump of both executables, it looks like the information I want is present under the DW_AT_decl_file tag.
< 0><0x0000000c> DW_TAG_compile_unit DW_AT_producer (indexed string: 0x00000000)Fuchsia clang version 15.0.0 (https://llvm.googlesource.com/a/llvm-project 3a20597776a5d2920e511d81653b4d2b6ca0c855) DW_AT_language DW_LANG_C_plus_plus_14 DW_AT_name (indexed string: 0x00000001)translation_unit2.cc DW_AT_str_offsets_base 0x00000058 DW_AT_stmt_list 0x000000f0 DW_AT_comp_dir (indexed string: 0x00000002) DW_AT_low_pc (addr_index: 0x00000001)0x00001880 DW_AT_high_pc <offset-from-lowpc> 56 <highpc: 0x000018b8> DW_AT_addr_base 0x00000030 - LOCAL_SYMBOLS: - < 1><0x00000035> DW_TAG_variable DW_AT_name (indexed string: 0x00000005)sample_array DW_AT_type <0x00000040> DW_AT_external yes(1) DW_AT_decl_file 0x00000001 /array.h DW_AT_decl_line 0x00000003 DW_AT_location len 0x0002: 0xa100: DW_OP_addrx 0
Unfortunately this debug entry is missing the two attributes Bloaty generally depends on to attribute this to a section of the binary:
DW_AT_location
: is present, but the provided address is 0, probably due to identical code folding that merged the two copies of this variable into one.DW_AT_linkage_name
: not present, but if it was present it would give us a name we can look up in the symbol table.
Without one of these two, Bloaty doesn't know which part of the binary sample_array
corresponds to.
When I look in the symbol table (binary compiled with Clang), I see an address of 0x2010
for the sample_array
symbol:
$ readelf -Ws main | grep sample_array
31: 0000000000002010 40 OBJECT WEAK DEFAULT 17 sample_array
But when I dump the debug info, unfortunately there is no DIE that references this address:
$ readelf --debug-dump=info main | grep 2010
$
Looking at verbose Bloaty output, it looks like the only way Bloaty was able to attribute sample_array
to main.cc
at all was by disassembling the binary:
$ ~/code/bloaty/bloaty -vvv -d compileunits main | grep -A1 compileunits.*\\[2010
[compileunits, x86_disassemble] AddVMRangeForVMAddr(1193, [2010, ffffffffffffffff])
-> translates to: [2010 ffffffffffffffff]
--
[compileunits, x86_disassemble] AddVMRangeForVMAddr(11d3, [2010, ffffffffffffffff])
-> translates to: [2010 ffffffffffffffff]
$
The address was referenced from two different functions, and it was a matter of luck which one Bloaty found first. Looking at the symbol table, the two addresses 0x1193
and 0x11d3
refer to the functions increment_i_1(int*)
and increment_i_2(int*)
:
$ readelf -Ws --demangle main
Symbol table '.symtab' contains 40 entries:
Num: Value Size Type Bind Vis Ndx Name
[...]
33: 0000000000001170 53 FUNC GLOBAL DEFAULT 15 increment_i_1(int*)
34: 0000000000001130 50 FUNC GLOBAL DEFAULT 15 main
35: 00000000000011b0 53 FUNC GLOBAL DEFAULT 15 increment_i_2(int*)
[...]
So why did it reference main.cc
instead of translation_unit1.cc
or translation_unit2.cc
? Looking at the verbose Bloaty output, it looks like this came from dwarf_pcpair
:
$ ~/code/bloaty/bloaty -vvv -d compileunits main | grep compileunits.*1170
[compileunits, dwarf_pcpair] AddVMRange(main.cc, 1170, 35)
[compileunits, dwarf_pcpair] AddVMRange(main.cc, 1170, 35)
[compileunits, dwarf_fde_table] AddFileRangeForVMAddr(1170, [2064, 8])
[compileunits, dwarf_fde] AddFileRangeForVMAddr(1170, [2120, 18])
[compileunits, elf_symtab_name] AddFileRangeForVMAddr(1170, [3c75, 14])
[compileunits, elf_symtab_sym] AddFileRangeForVMAddr(1170, [3a18, 18])
But when I dump the debug info looking for this DW_AT_low_pc=0x1170, DW_AT_high_pc=0x35
, this pcpair only appears for the translation_unit1.cc
compileunit:
<0><86>: Abbrev Number: 1 (DW_TAG_compile_unit)
<87> DW_AT_producer : (indexed string: 0): Debian clang version 14.0.6
<88> DW_AT_language : 33 (C++14)
<8a> DW_AT_name : (indexed string: 0x1): translation_unit1.cc
<8b> DW_AT_str_offsets_base: 0x38
<8f> DW_AT_stmt_list : 0x90
<93> DW_AT_comp_dir : (indexed string: 0x2): /tmp/t
<94> DW_AT_low_pc : (index: 0x1): 0x1170
<95> DW_AT_high_pc : 0x35
<99> DW_AT_addr_base : 0x28
So this looks like a bug in Bloaty. It should have been attributed to translation_unit1.cc
, not main.cc
. It looks like a bug in indexed strings.
On the declfile
proposal
You asked this question:
Is it possible to create a different source type which is primarily based on the decl file (as this is essentially what I want and compileunits was just as close as I could get)?
Unfortunately the DW_TAG_variable
debugging entry you quoted before doesn't given enough information to attribute this to any specific part of the binary:
DW_AT_name (indexed string: 0x00000005)sample_array DW_AT_type <0x00000040> DW_AT_external yes(1) DW_AT_decl_file 0x00000001 /array.h DW_AT_decl_line 0x00000003 DW_AT_location len 0x0002: 0xa100: DW_OP_addrx 0
It does contain DW_AT_name=sample_array
, and in this case sample_array
happens to also be the linkage name. But this is just a coincidence and cannot be relied on. Suppose we change the program slightly so the function definitions look like this:
void increment_i_2(int* i) {
static int sample_array[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
sample_array[*i]++;
*i += sample_array[*i % 10];
*i += i_2;
}
void increment_i_1(int* i) {
static int sample_array[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
sample_array[*i]++;
*i += sample_array[*i % 10];
*i += i_1;
}
The debug info will still refer to these variables as sample_array
, because that is their name in the code:
<2><72>: Abbrev Number: 3 (DW_TAG_variable)
<73> DW_AT_name : (indexed string: 0x3): sample_array
<74> DW_AT_type : <0x89>
<78> DW_AT_decl_file : 0
<79> DW_AT_decl_line : 7
<7a> DW_AT_location : (DW_OP_addrx <0>)
However the linkage names of these variables are now different, because each function needs its own copy of sample_array
:
$ readelf -Ws main | grep sample_array
13: 0000000000004010 40 OBJECT LOCAL DEFAULT 25 _ZZ13increment_i_1PiE12sample_array
15: 0000000000004040 40 OBJECT LOCAL DEFAULT 25 _ZZ13increment_i_2PiE12sample_array
So unfortunately using DW_TAG_variable
entries to attribute this info to a given declfile doesn't appear to be viable.