Direction for PE file support

Question

Direction for PE file support

learn-more opened this issue 3 years ago · comments

Mark Jansen commented 3 years ago

To add support for PE files there are a few different approaches that can be used:

Import a submodule for LIEF (this also supports other files, maybe all parsers could leverage this?)
Import a submodule for a parser like pe-parse which is more of an iterating parser, and specialized for PE files
Grab some headers / classes from LLVM and write something along the lines of COFFDump
Grab only the PE headers from LLVM and write a complete custom parser
Other (please specify)

What would be the preferred way of moving forward?

Joshua Haberman · Answer 1 · Sun Dec 06 2020 01:49:15 GMT+0800 (China Standard Time)

Thanks for your interest in adding PE support! This is something I've wished for for a while.

Generally with Bloaty I have found that custom parsers are necessary. Bloaty cares about not only the data in the file, but the precise location of each bit of data in the file. For example, for the file headers and symbol table entries, we need to not only read them, but report their byte range within the file.

Generally I've found that existing libraries do not offer this information, because almost no program besides Bloaty needs it. For this reason, all of the existing parsers in Bloaty take the final approach you mentioned (grab the headers and write a complete custom parser). I expect PE will probably require the same.

Mark Jansen · Answer 2 · Mon May 03 2021 17:18:50 GMT+0800 (China Standard Time)

@haberman now that the initial PR is merged,
how do you want to proceed with PE support?

Joshua Haberman · Answer 3 · Thu Aug 05 2021 04:59:03 GMT+0800 (China Standard Time)

Now that we have the lit testing in place, I'm a lot more comfortable moving forward with expanding PE support.

I'd love to see support for:

segments: this would be the regions of the file that the loader will load. The segments name is somewhat ELF-specific, but I think PE has something similar, like in the optional header?
symbols: using the symbol table hopefully we could get some good symbol support here.
compileunits: I assume PE files have this information available for debugging?

What do you think?

Mark Jansen · Answer 4 · Thu Aug 05 2021 14:53:00 GMT+0800 (China Standard Time)

Now that we have the lit testing in place, I'm a lot more comfortable moving forward with expanding PE support.

I'd love to see support for:

segments: this would be the regions of the file that the loader will load. The segments name is somewhat ELF-specific, but I think PE has something similar, like in the optional header?

symbols: using the symbol table hopefully we could get some good symbol support here.

compileunits: I assume PE files have this information available for debugging?

What do you think?

segments seems to be very do-able, the PE header can be split in:

DOS Header
DOS Stub
Rich Header? (not sure if this can be done reliable, otherwise it will be dumped on top of the DOS Stub)
NT header
Per-section header (for each .text, .rdata etc section)

As for symbols: This is usually present in a PDB file, which at least yaml2obj does not support, and which would require another (extra) parser.
PE files with DWARF support should be do-able, but this are only gcc-built binaries, and those are not 'common' other than a few hobby projects.

compileunits: I have no clue to be honest, but if this was present somewhere it would probably also be in the pdb file.