lunixbochs / usercorn

dynamic binary analysis via platform emulation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Display More Symbols in Trace

Nokel81 opened this issue · comments

Trace already has access to symbols, but it would be very useful to be able see the label names when they are used (in the assembly) for jumps, loads, and addressing.

Example:

.s:

mov eax, [some_label + 4]

...
some_label:
dd 3
dd 6

usercorn trace:

L some_label: 0x804a680
mov eax, dword ptr [0x804a680 + 4]
0x8048621: push ebp                        |    esp = 0xbffff7a8 | W bffff7a8
W 0xbffff7c8: f8f7ffbf                     [....                ] W
R 0xbffff7d0: 74000000 0cb00408            [t.......            ] R
W 0xbffff7bf: 74                           [t                   ] W
W 0xbffff7ac: f6860408                     [....                ] W
0x8048622: mov ebp, esp                    |    ebp = 0xbffff7a8
0x8048624: push edi                        |    esp = 0xbffff7a4 | W bffff7a4
0x8048625: push esi                        |    esp = 0xbffff7a0 | W bffff7a0
0x8048626: mov esi, ecx                    |    esi = 0x00000001
0x8048628: push ebx                        |    esp = 0xbffff79c | W bffff79c
0x8048629: mov ebx, eax                    |    ebx = 0x0804b00c
                                           +    ebx = _stdout
0x804862b: sub esp, 0xc                    | eflags = 0x00000084
                                           +    esp = 0xbffff790
0x804862e: test ecx, ecx                   | eflags = 0x00000000
0x8048630: je 0x8048684                    |                    

buf_write+0x11
0x8048632: mov eax, dword ptr [eax + 8]    |    eax = 0x00000000 | R 804b014 (_stdout+0x8)
W 0xbffff79c: 04f8ffbf                     [....                ] W
R 0x0804b014: 00000000                     [....                ] R
0x8048635: mov edi, edx                    |    edi = 0xbffff7bf
0x8048637: lea edx, [ecx + eax]            |    edx = 0x00000001
0x804863a: cmp edx, 0x1000                 | eflags = 0x00000081
0x8048640: jle 0x8048659                   |                    

buf_write+0x21
0x8048659: push edx                        |    esp = 0xbffff78c | W bffff78c
0x804865a: push ecx                        |    esp = 0xbffff788 | W bffff788
0x804865b: push edi                        |    esp = 0xbffff784 | W bffff784
0x804865c: lea eax, [ebx + eax + 0x10]     |    eax = 0x0804b01c
                                           +    eax = _stdout+0x10
0x8048660: push eax                        |    esp = 0xbffff780 | W bffff780
0x8048661: call 0x804889d                  |    esp = 0xbffff77c | W bffff77c

buf_write+0x45
0x804889d: push ebp                        |    esp = 0xbffff778 | W bffff778
W 0xbffff77c: 66860408                     [f...                ] W
0x804889e: xor edx, edx                    |    edx = 0x00000000
                                           + eflags = 0x00000044
0x80488a0: mov ebp, esp                    |    ebp = 0xbffff778
0x80488a2: mov eax, dword ptr [ebp + 8]    |                     | R bffff780
0x80488a5: push ebx                        |    esp = 0xbffff774 | W bffff774
0x80488a6: mov ebx, dword ptr [ebp + 0xc]  |    ebx = 0xbffff7bf | R bffff784
0x80488a9: cmp edx, dword ptr [ebp + 0x10] | eflags = 0x00000095 | R bffff788
0x80488ac: je 0x80488b7                    |                    

memcpy+0x11
0x80488ae: mov cl, byte ptr [ebx + edx]    |    ecx = 0x00000074 | R bffff7bf
W 0xbffff774: 0cb00408                     [....                ] W
R 0xbffff780: 1cb00408 bff7ffbf 01000000   [............        ] R
R 0xbffff7bf: 74                           [t                   ] R
0x80488b1: mov byte ptr [eax + edx], cl    |                     | W 804b01c (_stdout+0x10)
0x80488b4: inc edx                         |    edx = 0x00000001
                                           + eflags = 0x00000001
0x80488b5: jmp 0x80488a9                   |                    

memcpy+0x1a
0x80488a9: cmp edx, dword ptr [ebp + 0x10] | eflags = 0x00000044 | R bffff788
W 0x0804b01c: 74                           [t                   ] W
0x80488ac: je 0x80488b7                    |                    

I hacked this together, notice two assignments to ebx (one for the symbol, one for the address) and _stdout+0x10 for the memory write

0x8048629: mov ebx, eax                    |    ebx = 0x0804b00c
                                           +    ebx = _stdout
0x80488b1: mov byte ptr [eax + edx], cl    |                     | W 804b01c (_stdout+0x10)

It would be harder to put symbols in the actual disassembly, because the disassembler backend (capstone) doesn't trivially support symbols.

That is definitely quite good and would be very helpful

https://github.com/lunixbochs/usercorn/tree/sym-ui
Notice the commit was very simple and the stream UI itself is not very much code, feel free to tweak it more and offer suggestions, or even skin the whole stream UI yourself to support some specific thing you're doing :)

Usercorn has a very nice execution trace format that the UI uses (the stream UI doesn't need access to the actual running CPU, it does everything using only the trace!), you can also write your own programs to analyze it.

The file format is here: https://github.com/lunixbochs/usercorn/blob/master/go/models/trace/PROTOCOL.md

You can generate trace files with usercorn run -trace -to filename ./binary
You can run the UI on trace files with usercorn trace -pretty filename
You can convert a trace to JSON if you want to easily parse it in Python or something with usercorn trace -json filename
You can convert a trace to drcov for importing into Lighthouse in IDA or something with usercorn trace -drcov coveragefilename filename

After I reinstalled, I get the following error when I try and run it

Error: Could not identify file magic.
        load.go:38      | loader.LoadArch()
        load.go:19      | loader.Load()
        usercorn.go:167 | go.NewUsercorn()
        cmd.go:68       | cmd.NewUsercornCmd.func1()
        cmd.go:352      | cmd.(*UsercornCmd).Run()
        main.go:10      | run.Main()
        launcher.go:45  | cmd.Main()
        main.main()

What command are you running?

usercorn run -trace -mtrace -rtrace filename

You only need -trace. -mtrace and -rtrace are implied by trace. filename must not be an executable. Did you overwrite it or something?

It must not be an executable? I think my problem was that is wasn't one. But now I just don't get the new outputs. Is that only for display afterwards or can run -trace work with them too?

I have run nm on the executable, and it does output symbols

usercorn run -trace bins/x86.linux.elf should have the new symbol output

Thanks. I will look at maybe trying to figure out why some of the mem writes are not showing up on the right side later, but it does look that loading into registers does print.

There are two outputs for memory writes - membatch, which is the hexdump, and mtrace, which is the R / W addrs. I would be very surprised if an assembly instruction accessed memory and there wasn't a corresponding mtrace output for it.

So I was talking about mtrace output. And I know that it is a symbol but it didn't output on the right.

what usercorn outputs:

mov dword ptr [0x804c0fc], eax                    |                     | W 804c0fc

what is in the assembly file:

mov [STATIC_Boolean_MAX_VALUE], eax

is STATIC_Boolean_MAX_VALUE an actual symbol that ended up in the file?

usercorn doesn't have any way of seeing the assembly file, it just parses debug information from the executable.

Yes, running nm on the file I get

0804c0fc D STATIC_Boolean_MAX_VALUE

what is D

           "D"
           "d" The symbol is in the initialized data section.

You should look into how Go loads symbols from an ELF file and make sure that symbol is there?

You can import fmt and put some prints in the loops in this function https://github.com/lunixbochs/usercorn/blob/master/go/loader/elf.go#L164

In my example, these symbols are in the same area and work fine:

0804b008 D stderr
0804b004 D stdin
0804b000 D stdout

If you run usercorn run -mtrace -v filename it will show you the memory mapping - make sure the address you're looking at is attributed to the binary?

In my case:

  0x8048000-0x804a000 r-x [exe] bins/x86.linux.elf
  0x804b000-0x804f000 rw- [exe] bins/x86.linux.elf(0x2000)

0x0804b00c is within the 0x804b000-0x804f000 mapping, so that is what is queried to look up stdout

If I run ./usercorn run -repl bins/x86.linux.elf, I can run maps 0x0804b00c to look up the map as well:

0x8048cb7> maps 0x0804b00c
Memory map:
0x804b000-0x804f000 rw- [exe] bins/x86.linux.elf(0x2000)
0x804b00c: 0x804b000+0xc

And:

0x8048cb7> us:Symbolicate(0x0804b00c, false)
userdata: 0xc420b519b0 "_stdout"