lunixbochs / usercorn

dynamic binary analysis via platform emulation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regarding the python interface

thebabush opened this issue · comments

Hi,

first of all thanks for the tool. During the last DEFCON finals I realized once again how there's a lack of tools when it comes to userspace and scriptable multi-arch emulation.
Usercorn looks like it might be THE tool if it matures enough.

That being said, I noticed you removed the python interface in favor of #184.
As everyone in security I do love python and I did a small PoC of how a Go/python integration would work (thebabush/usercorn).

If you want to try it, just make && make py && ./make.sh.

So the idea is to use opaque handles + ref counting as a way around the garbage collector.
I think that most (all?) of go/models/usercorn.go could be easily mapped to C or FFI.
It's a PITA to do the mapping manually, but after some tests with parsing Go, I would say that doing it automatically in the proper way would be a whole project by itself (maybe a regex-based approach would suffice, which is what z3 bindings for python do AFAIK).

Still, why not create a barebone plugin mechanism instead of exposing usercorn as a shared object? Like usercorn --whatever whatever.so ./my_binary.

My stuff actually uses CFFI at compile time, which should be faster than a shared plugin and should support Python 2/3/pypy.

I'm opening this issue to see what you think about it. It's just an hack for now but it looks like a viable way of implementing scripting (or a general C API).

Usercorn already has a scripting engine (luaish), which is lua modified to be more pythonic, and it autobinds the whole API.

See some basic examples here: https://github.com/Caesurus/usercorn_examples

Due to limitations in Go, usercorn definitely needs to be the script host. It doesn't make sense to compile usercorn to a shared object at all.

Another option is to generate an RPC layer and run the scripts in a separate process. Go's reflection is fairly capable. Neovim's msgpack-rpc system has a lot of flexibility and is able to service many programming languages and embedding styles easily, so it might be worth looking in that direction.

Honestly the first step is going to be collecting goals of the scripting system. For example, I want to make it very easy to write a custom binary loader, or extend an existing binary loader using a script.

I do know about the scripting engine, thanks.

RPC sounds slow to me, but I may well be wrong.

Yeah that's a good point. Being able to extend the loader would be awesome. Still, I think that a common use case might be using usercorn as an advanced/scriptable debugger or as an multiarch (slow-ish) instrumentation tool. AFAIK the current offering (PIN/qbdi/panda/etc...) don't cover that space. Like, as of now there's no easy way (that I know of) to trace an ARM linux binary on x86. Yes you can use QEMU user and its logging feature, but it is far from ideal.

RPC is mostly only going to be slow if you're doing something on a per instruction level, and there's no fast way to do that in Python with or without RPC.

There are also some pretty big performance improvements I can do to unicorn/usercorn at some point for read-only hooks (if you don't need to modify anything during the hook callback).

If you're using usercorn as a scriptable debugger or doing things like basic binary loading, RPC should be plenty fast enough. If you're doing read-only tracing, you can trace to a file and parse it in whatever language you want, which will be way faster than trying to do it in-process due to FFI overhead.

RPC would also be able to inject simple luaish snippets/hooks to modify behavior that wouldn't need a round trip through the RPC interface.

It would help me think about this quite a bit if you describe some main specific APIs you want + what sorts of things about the emulator you want to be able to script.

Stuff like:

  • Breakpoint/callbacks on complex conditions
  • Modify registers/memory
  • Save/restore CPU state
  • Tracing

All of this could be used to write quick dynamic analysis scripts to help reversing in IDA/binja. Taint tracking, tracing memory operations, stuff like that. Also, it could be used to prototype all sorts of tools.