google / bloaty

Bloaty: a size profiler for binaries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Direction for PE file support

learn-more opened this issue · comments

To add support for PE files there are a few different approaches that can be used:

  • Import a submodule for LIEF (this also supports other files, maybe all parsers could leverage this?)
  • Import a submodule for a parser like pe-parse which is more of an iterating parser, and specialized for PE files
  • Grab some headers / classes from LLVM and write something along the lines of COFFDump
  • Grab only the PE headers from LLVM and write a complete custom parser
  • Other (please specify)

What would be the preferred way of moving forward?

Thanks for your interest in adding PE support! This is something I've wished for for a while.

Generally with Bloaty I have found that custom parsers are necessary. Bloaty cares about not only the data in the file, but the precise location of each bit of data in the file. For example, for the file headers and symbol table entries, we need to not only read them, but report their byte range within the file.

Generally I've found that existing libraries do not offer this information, because almost no program besides Bloaty needs it. For this reason, all of the existing parsers in Bloaty take the final approach you mentioned (grab the headers and write a complete custom parser). I expect PE will probably require the same.

@haberman now that the initial PR is merged,
how do you want to proceed with PE support?

Now that we have the lit testing in place, I'm a lot more comfortable moving forward with expanding PE support.

I'd love to see support for:

  • segments: this would be the regions of the file that the loader will load. The segments name is somewhat ELF-specific, but I think PE has something similar, like in the optional header?
  • symbols: using the symbol table hopefully we could get some good symbol support here.
  • compileunits: I assume PE files have this information available for debugging?

What do you think?

Now that we have the lit testing in place, I'm a lot more comfortable moving forward with expanding PE support.

I'd love to see support for:

  • segments: this would be the regions of the file that the loader will load. The segments name is somewhat ELF-specific, but I think PE has something similar, like in the optional header?
  • symbols: using the symbol table hopefully we could get some good symbol support here.
  • compileunits: I assume PE files have this information available for debugging?

What do you think?

segments seems to be very do-able, the PE header can be split in:

  • DOS Header
  • DOS Stub
  • Rich Header? (not sure if this can be done reliable, otherwise it will be dumped on top of the DOS Stub)
  • NT header
  • Per-section header (for each .text, .rdata etc section)

As for symbols: This is usually present in a PDB file, which at least yaml2obj does not support, and which would require another (extra) parser.
PE files with DWARF support should be do-able, but this are only gcc-built binaries, and those are not 'common' other than a few hobby projects.

compileunits: I have no clue to be honest, but if this was present somewhere it would probably also be in the pdb file.