eliben / pyelftools

Parsing ELF and DWARF in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not compatible to XC16 compiled ELF files

yashagarwal-314 opened this issue · comments

I am trying to use the pyelftools to get the variable data type for the dspic33ck elf file and it doesn't work.

Can you share one of the offending files? If they are sensitive, can you build a dummy project with the same toolchain and settings and, if the problem reproduces, share that?

Also, please share what are you doing exactly with pyelftools. Does the file not open, or something goes wrong later, during debug info parsing/navigation?

The attachment didn't go through. Please navigate to the issue at Github and attach it there.

https://drive.google.com/file/d/1oTaSIvRCXfMlEzVjOPPxdk4UA4pS4Bm_/view?usp=drive_link

Hello @sevaa,

Thank you for your message, I have uploaded it on G-Drive and now you should be able to access it.

thank you for your support

No immediate access. Requested.

I see it now. The DWARF format seems broken. GNU readelf chokes on that file too:

readelf: Warning: Corrupt unit length (0x300ac) found in section .debug_info
readelf: Warning: Corrupt unit length (0x300ac) found in section .debug_info

The bytes on the very top of the .debug_info don't look like a valid CU header. It goes:

uint32 unit_length: AC 00 03 00
uint16 version 00 00
uint32 abbrev_offset 00 00 02 00
uint8 address_size 00

Zero is not a valid value neither for version nor for address size.

Hey Sevaa,

unfortunately, I have no influence on the elf file, but this project is very important to me and It would be great if somehow you can guide me through a workaround, or you can make some changes in the library so the pyelftools works for this kind of files as well.

thank you!

Is it possible that the file is intentionally obfuscated to prevent the kind of analysis you are trying to do?

Hey Sevaa,

Thank you for your message!

please let me know if you need any further information, thank you!

If you have a version of readelf that can parse and dump the debug info, I suggest that you dump the dwarf info into a text file (use readelf -wi) and parse that. Should be enough for variable datatype recovery. You'll have to do some DIE ref chasing.

While it might be a fascinating project to figure out this flavor of DWARF, I don't think I can commit to that while not knowing the scope. Someone else might, but I don't see much enthusiasm here.

Closing this - parsing malformed DWARF that GNU tooling chokes on is not a task pyelftools is designed for

Okay, some data points.

When the compiler vendors say theirs is a 16 bit machine, they take it seriously :) Looks like in this binary's flavor of DWARF, the standard [U]LEB128 integer encoding has been replaced with fixed width uint16 where possible.

My starting point was the abbrev table. Normally, one consists of mostly ULEB128 numbers - an abbreviation record contains the header with code, tag, and an uint8 has-children flag, followed by a set of (attribute, form) pairs until a null pair. Were the contents of the abbrev section in this binary be interpreted as all uint16's instead (even the has-children flag), the top of it looks like a sensibly looking abbrev:

Code 1
DW_TAG_compile_unit
Has children: yes
DW_AT_producer DW_FORM_string
DW_AT_language DW_FORM_data1 (but it's uint16 in the DIE anyway)
DW_AT_name DW_FORM_string
DW_AT_comp_dir DW_FORM_string
DW_AT_low_pc DW_FORM_addr
DW_AT_high_pc DW_FORM_addr
DW_AT_stmt_list DW_FORM_data4
0 0

Meanwhile in the info section, there is a UTF16 string starting with "GNU C..." at 0x18. Clearly the value of the producer attribute. Notably, it's preceded by uint16 0x1, which looks a lot like the abbrev code, followed by uint16 0x1 (DW_LANG_C89), then another UTF16 string that looks like a filename. Clearly DIE values.

EDIT: the CU header structure is unusual, too. The CU length is 8 bytes arranged as 4 uint16's. From the corpus that I'm seeing, the formula for length is word0*2+word1*512; words 2 and 3 are zeros throughout.


ELF Machine code is "Microchip Technology dsPIC30F", internally EM_DSPIC30F.


In conclusion, parsing this is definitely not a job for pyelftools :)

In theory, armed with this knowledge and with knowledge of DWARF proper, I could slap together an ad hoc parser that would dump the DIE tree for the OP. But I'm now wondering how far the OP is in their quest to recover the variable datatype from the readelf output, as I've suggested.

@yashagarwal-314 see the latest on #518.


EDIT: all odd nonzero bytes in this issue's binary follow the same pattern. They all are 0x80 as the second byte in the logically 2 byte (physically 4) encoding of attribute DW_AT_language in the top DIE of the sources that are compiled with GNU AS. Ostensibly, the value of the attribute is 1, which stands for ANSI C, but AS is not a C compiler. I guess it's the assembler's way of marking its compile units. Or it could a bug in the way AS emits the DW_AT_language :) Anyway, introducing a special case handling just for that doesn't make a lot of sense. So barring other wrinkles, I think you can consider the monkeypatch from #518 workable for this issue's binary too.