eliben / pyelftools

Parsing ELF and DWARF in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does this pyelftools support for multi thread handling?

ytxloveyou opened this issue · comments

When the input elf files are very large, the Processing time becomes very long.... So i am wondering if it could support for multi thread or process handling....

Python is notoriously single threaded :)

Pyelftools tries to lazy-load where possible. What's the usage scenario exactly? ELF proper parsing, or DWARF? For one thing, the DWARF handling portions of the library tend to load sections as a whole, with no support for progressive loading.

Python is notoriously single threaded :)

Pyelftools tries to lazy-load where possible. What's the usage scenario exactly? ELF proper parsing, or DWARF? For one thing, the DWARF handling portions of the library tend to load sections as a whole, with no support for progressive loading.


ok....thanks for feedback... i try to use it to re-construct and analye the variable type (like structure ,union or so on) from the elf dwarf info... when the elf file is less than 5mb, then it is fine. but if it is bigger, it might cost 5-15 minutes to go through all compute units and the content in it. So i am wondering whether the multi-process or multi-thread could help in this case....
If no, maybe i need to find out how to do it in C or some other ways to do multiple process job in the meantime.....

The reason why i raise this issues is that in our company(automotive product) we use vector toolchain( maybe you know it or not) it is called ASAP2 tool which could analyze ELF files to generate symbol with types. it runs very fast... I really want to know how could they achieve it..

So it's DWARF. Have you timed the execution - which call exactly is taking 15 minutes?

Just assume that the there are more than 10000 compute units , so by calling cu_iter it might cost 5 minutes, then for each comput units , if i need to export all variable with its types, then i need to loop through all the DT members to find out the relevant type definition, this process might need 10 minutes in some cases... something like it ... maybe i did something wrong...

So off the top of my head, one thing you can try is iterating through DIEs smarter. Are you after all variables, or all static lifetime variables? Globals, static class members, or both? When it comes to modern DWARF, navigating to a sibling is a fast operation. You could try that instead of scrolling through all DIEs.

Iterating between compile units is not a long operation per se - there is a linked list-like data structure there, going from a CU to the next CU is fast. That said, the section needs to be loaded first, and that's a time consuming piece of I/O.

One more thing I thought of, you might be able to slice some time from I/O by somewhat reimplementing get_DWARF_info() - don't load the miscellaneous sections. There is no built-in support for lazy loading of those (that I know of), you'd have to roll your own. It's a good idea for improvement, though. Depending on what kind of information you want to dump, though, loclists might be necessary,