CheckPointSW / Karta

Karta - source code assisted fast binary matching plugin for IDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support python-idb

XVilka opened this issue · comments

Is it possible to use famous python-idb library?
To alleviate the need to run IDA Pro itself, perfect for automation purposes.

Karta consists of 2 phases:

  1. Analysis - Creating a .json configuration from the compiled .o/.obj files
  2. Matching - Matching the functions in the binary, and showing the matching results in the UI

python-idb won't help the analysis phase, as there is no .idb file in this case. And in the matching phase we need some UI/GUI from the disassembler in order to show the results.

Instead, we chose to define a disassembler-api, see docs here: https://karta.readthedocs.io/en/latest/disassembler.html
Any disassembler that extends this API will be supported by Karta, and indeed support for radare2 is almost finished already, currently being developed by megabeets.

On second thought, automation using python-idb could be useful. For example for automatically identifying the used open sources in a large data-set of binaries.

As I have no prior experience with the python-idb library, I suggest you'll implement it using the disassembler-api. If you encounter any trouble with the implementation, ping me for help. In addition, you could wait to the radare2 disassembler api example that should be published in the near future.

Just for the record - we updated python-idb to support all 5.x-7.5 IDA Pro versions of IDB, covered with tests. The latest release (0.7.1) is available on PyPI: https://pypi.org/project/python-idb/

I tried to add a "semi-disassembler" API for python-idb, but it doesn't seem mature enough to be worth the effort of implementing the missing parts.

First, python-idb only parses existing .idb files, meaning that it can't be used to create config file for compiled open sources.

Second, it is a CLI based utility, meaning that the GUI API for the user's input (configs directory, "Is Windows Binary") should be passed in an alternative way, affecting the entire project, and not simply under a "disassembler" implementation.

Third, while presenting the matching results could be an stdout printout, applying these matches to the .idb is impossible, again because it is a RO API to the .idb file.

And despite all of the above, I tried to implement an API so it will work at least for the karta_identifier.py but I gave up when I saw the following:

  1. sark isn't supported, so I will need to convert it all with python-idb API for basic utilities
  2. ida_search isn't supported, so immediate search (used heavily by Karta) should be replaced somehow
  3. I couldn't access the path of the input .idb file, required for creating the output file for the identifier script

At the current moment, due to the scarce functionality that is offered by this "semi" disassembler, I fail to see the value of breaking my teeth on adding this support. As I said before, feel free to implement this support and send a pull request. After all you have way better knowledge of python-idb than I have.