support for features required by dwp
davidtwco opened this issue · comments
I'm looking into implement a dwp tool using gimli
(to use as part of rustc's split dwarf support), this requires being able to concatenate relevant input compilation units into a new Dwarf
to be written and generate cu/tu indices. gimli
doesn't support writing cu/tu indices and the current read/write API isn't well suited to copying the compilation units into a new Dwarf
.
I'm happy to do any required implementation on this, given necessary pointers.
You shouldn't use the read/write API for copying the compilation units, or create a new Dwarf
. I'm not sure it makes sense for gimli
to provide anything for copying the units. It's mostly a problem of concatenating ELF sections, so it will depend more on your ELF writer. I suggest using the object
crate's write::elf::Writer
for this.
We need to add support for writing cu/tu indices, but this should be implemented in tandem with the dwp
tool. It should be mostly separate from the existing write support. You may want to reuse Writer
. You will need to use the read API to get information from the root DIEs in order to build the index sections.
You probably also need something to construct the .debug_str.dwo
section, but I'm not sure if the existing .debug_str
support is suitable for that.
I haven't looked in depth at what dwp
does, so I may be missing details in the above.
Doesn't dwp also deduplicate parts of the debuginfo between compilation units? That would require rewriting the compilation units.
Doesn't dwp also deduplicate parts of the debuginfo between compilation units? That would require rewriting the compilation units.
Yeah, duplicate type units are removed and the string tables are merged (source: section F.3 of the DWARF5 specification), everything else is basically just concatenated.
Ok, but that still doesn't require rewriting the compilation/type units. It just means we need to concatenate portions of the sections instead of complete sections. You still don't need to parse or rewrite the entire unit to do that.
F.3 is written as though each unit (in dwarf objects) gets its own .debug_info.dwo
section. Does LLVM do that or put them all in one?
I’ve only seen dwarf objects with a single .debug_info.dwo
in my local experimentation, but that might just be because those are generated from fairly simple examples. Package files always have one .debug_info.dwo
section.
Small update: I've got a working prototype, it still needs work before it's properly usable, but looks like this is possible. In terms of how I've used gimli
:
- As per your recommendation, I used
object
for most things I needed to do, sogimli
wasn't used too much except for its constants, ids and some basic parsing and iteration on compilation/type units. - I implemented my own
StringTable
, it works quite similarly togimli::write::StringTable
except that I can produce multipleDebugStrOffsets
from it. I need this to merge the.debug_str
sections from multiple DWARF object files and then rebuild the.debug_str_offset
sections from each with the offsets into that new section.- When rebuilding the
.debug_str_offset
sections, I had to read the raw data of the original sections withEndianSlice
to iterate over the offsets and map them to their new offsets. - DWARF 5's specification has
.debug_str_offsets
having a header.None of the other dwp implementations output it, and I haven't implemented doing that either (yet, I will).llvm-dwp
can write this, I was looking at the source for released versions.As far as I can tell,Looks likegimli
doesn't have support for reading this (Sec 7.26, Pg 240).gimli
'sDebugStrOffsetsBase
knows to skip the header.
- When rebuilding the
.debug_{cu,tu}_index
are written manually usingEndianVec
. This is mostly straightforward, only tricky part is the hash table that you need to write, but it's not too bad.- As far as I can tell from reading the documentation on the GNU extension,
.debug_macinfo.dwo
is a valid return value fromSectionId::dwo_name
, but won't be returned currently.
All in all, the lower-level APIs that gimli
/object
expose for reading and writing are quite nice to use, and it made things a lot simpler, and it's helpful that gimli
can read the DWARF 5 package file format, my local versions of readelf
/objdump
can't. There's still plenty I'm unfamiliar with w/r/t DWARF, so some of what I've said above might be incorrect.
I'll close this because I've implemented the tool now (available at davidtwco/thorin), there's probably some things we could upstream to gimli
but we can make separate issues/pull requests for those things rather than keep this large vague issue open.