gimli-rs / gimli

A library for reading and writing the DWARF debugging format

Home Page:https://docs.rs/gimli/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support for features required by dwp

davidtwco opened this issue · comments

I'm looking into implement a dwp tool using gimli (to use as part of rustc's split dwarf support), this requires being able to concatenate relevant input compilation units into a new Dwarf to be written and generate cu/tu indices. gimli doesn't support writing cu/tu indices and the current read/write API isn't well suited to copying the compilation units into a new Dwarf.

I'm happy to do any required implementation on this, given necessary pointers.

You shouldn't use the read/write API for copying the compilation units, or create a new Dwarf. I'm not sure it makes sense for gimli to provide anything for copying the units. It's mostly a problem of concatenating ELF sections, so it will depend more on your ELF writer. I suggest using the object crate's write::elf::Writer for this.

We need to add support for writing cu/tu indices, but this should be implemented in tandem with the dwp tool. It should be mostly separate from the existing write support. You may want to reuse Writer. You will need to use the read API to get information from the root DIEs in order to build the index sections.

You probably also need something to construct the .debug_str.dwo section, but I'm not sure if the existing .debug_str support is suitable for that.

I haven't looked in depth at what dwp does, so I may be missing details in the above.

Doesn't dwp also deduplicate parts of the debuginfo between compilation units? That would require rewriting the compilation units.

Doesn't dwp also deduplicate parts of the debuginfo between compilation units? That would require rewriting the compilation units.

Yeah, duplicate type units are removed and the string tables are merged (source: section F.3 of the DWARF5 specification), everything else is basically just concatenated.

Ok, but that still doesn't require rewriting the compilation/type units. It just means we need to concatenate portions of the sections instead of complete sections. You still don't need to parse or rewrite the entire unit to do that.

F.3 is written as though each unit (in dwarf objects) gets its own .debug_info.dwo section. Does LLVM do that or put them all in one?

I’ve only seen dwarf objects with a single .debug_info.dwo in my local experimentation, but that might just be because those are generated from fairly simple examples. Package files always have one .debug_info.dwo section.

Small update: I've got a working prototype, it still needs work before it's properly usable, but looks like this is possible. In terms of how I've used gimli:

  • As per your recommendation, I used object for most things I needed to do, so gimli wasn't used too much except for its constants, ids and some basic parsing and iteration on compilation/type units.
  • I implemented my own StringTable, it works quite similarly to gimli::write::StringTable except that I can produce multiple DebugStrOffsets from it. I need this to merge the .debug_str sections from multiple DWARF object files and then rebuild the .debug_str_offset sections from each with the offsets into that new section.
    • When rebuilding the .debug_str_offset sections, I had to read the raw data of the original sections with EndianSlice to iterate over the offsets and map them to their new offsets.
    • DWARF 5's specification has .debug_str_offsets having a header. None of the other dwp implementations output it, and I haven't implemented doing that either (yet, I will). llvm-dwp can write this, I was looking at the source for released versions. As far as I can tell, gimli doesn't have support for reading this (Sec 7.26, Pg 240). Looks like gimli's DebugStrOffsetsBase knows to skip the header.
  • .debug_{cu,tu}_index are written manually using EndianVec. This is mostly straightforward, only tricky part is the hash table that you need to write, but it's not too bad.
  • As far as I can tell from reading the documentation on the GNU extension, .debug_macinfo.dwo is a valid return value from SectionId::dwo_name, but won't be returned currently.

All in all, the lower-level APIs that gimli/object expose for reading and writing are quite nice to use, and it made things a lot simpler, and it's helpful that gimli can read the DWARF 5 package file format, my local versions of readelf/objdump can't. There's still plenty I'm unfamiliar with w/r/t DWARF, so some of what I've said above might be incorrect.

I'll close this because I've implemented the tool now (available at davidtwco/thorin), there's probably some things we could upstream to gimli but we can make separate issues/pull requests for those things rather than keep this large vague issue open.