Cannot reproduce the snapshot for HEVD fuzzer
mkubx opened this issue · comments
Hello. I followed the instructions here and installed W10 inside Hyper-V VM, 4GB RAM, 1 CPU. I also disabled KVA and used lockmem.exe when doing the memory dump (!bdump
).
Still, when I run the fuzzer (I modified only ExGenRandom
to reflect my Windows version), I am getting the following errors:
- Master
Seeded with 367867526817988208
Iterating through the corpus..
Sorting through the 1 entries..
Running server on tcp://localhost:31337..
#0 cov: 0 (+0) corp: 0 (0.0b) exec/s: -nan (1 nodes) lastcov: 2.0s crash: 0 timeout: 0 cr3: 0 uptime: 2.0s
Could not receive size (-1)
Receive failed
#0 cov: 0 (+0) corp: 0 (0.0b) exec/s: -nan (0 nodes) lastcov: 3.0s crash: 0 timeout: 0 cr3: 0 uptime: 3.0s
- Fuzzer node
Initializing the debugger instance.. (this takes a bit of time)
Setting debug register status to zero.
Setting debug register status to zero.
Dialing to tcp://localhost:31337/..
VirtTranslate failed for GVA:0xfffff80711b78600
Here is the dump of my image.
I also did some sanity checks if the code was not swapped out, mentioned here #41, so for instance KERNELBASE!DeviceIoControl
disassembles correctly. How can I debug this issue? I tried to create images 10-15 times with more or less the same result. Also, no problem to run the fuzzer with the snapshot you provided. Thanks.
Hey @mkubx,
thanks for the detailed report! So I haven't had a chance to look at things in details, but here are a few ideas if you'd like to try to debug this yourself (I will take a look at it this week or this week-end):
- When encountering those kind of error messages, the best way is to open the dump in windbg and checking things there. In this case, we can clearly see there's no memory:
kd> db 0xfffff80711b78600
fffff807`11b78600 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78610 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78620 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78630 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78640 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78650 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78660 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b78670 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
kd> !address 0xfffff80711b78600
Mapping user range ...
Mapping system range ...
Mapping non addressable range ...
Mapping page tables...
Mapping hyperspace...
Mapping HAL reserved range...
Mapping User Probe Area...
Mapping system shared page...
Mapping system cache working set...
Mapping loader mappings...
Mapping system PTEs...
Mapping system paged pool...
Mapping session space...
Mapping dynamic system space...
Mapping PFN database...
Mapping non paged pool...
Mapping VAD regions...
Mapping module regions...
Mapping process, thread, and stack regions...
Mapping system cache regions...
Usage: Module
Base Address: fffff807`11af0000
End Address: fffff807`11b7c000
Region Size: 00000000`0008c000
VA Type: BootLoaded
Module name: HEVD.sys
Module path: [\??\C:\Users\user\Downloads\hevd-unpacked\driver\vulnerable\x64\HEVD.sys]
- If you would like to understand how the code is attempting to access
0xfffff80711b78600
, you should generate a rip trace (--backend bxcpu --trace-type rip
and then symbolize the trace w/ symbolizer. This way you can figure out how you end up where you end up very precisely.
Hope this helps for now!
Cheers
Hey @0vercl0k, thank you for your time and support. I still have no idea what's happening and I am curious to figure it out, here is what I got so far.
-
0xfffff80711b78600
is not mapped even on the live machine. -
I tracked the values with the same IOCTL in my debugger as used during fuzzing and it exited successfully without error.
-
But there was a difference in compare with my trace logs, after syscall happened:
ntdll!NtDeviceIoControlFile+0x3:
0033:00007ff8`273ad0d3 b807000000 mov eax,7
kd>
ntdll!NtDeviceIoControlFile+0x8:
0033:00007ff8`273ad0d8 f604250803fe7f01 test byte ptr [SharedUserData+0x308 (00000000`7ffe0308)],1
kd>
ntdll!NtDeviceIoControlFile+0x10:
0033:00007ff8`273ad0e0 7503 jne ntdll!NtDeviceIoControlFile+0x15 (00007ff8`273ad0e5)
kd>
ntdll!NtDeviceIoControlFile+0x12:
0033:00007ff8`273ad0e2 0f05 syscall
kd>
ntdll!NtDeviceIoControlFile+0x14:
0033:00007ff8`273ad0e4 c3 ret
kd>
KERNELBASE!DeviceIoControl+0x6b:
0033:00007ff8`24eeb0bb 0f1f440000 nop dword ptr [rax+rax]
kd> r rax
rax=0000000000000000
kd> t
KERNELBASE!DeviceIoControl+0x70:
0033:00007ff8`24eeb0c0 3d03010000 cmp eax,103h
kd>
KERNELBASE!DeviceIoControl+0x75:
0033:00007ff8`24eeb0c5 7520 jne KERNELBASE!DeviceIoControl+0x97 (00007ff8`24eeb0e7)
kd>
KERNELBASE!DeviceIoControl+0x97:
0033:00007ff8`24eeb0e7 85c0 test eax,eax
kd>
KERNELBASE!DeviceIoControl+0x99:
0033:00007ff8`24eeb0e9 0f8808010000 js KERNELBASE!DeviceIoControl+0x1a7 (00007ff8`24eeb1f7)
kd>
KERNELBASE!DeviceIoControl+0x9f:
0033:00007ff8`24eeb0ef 488b8c24a0000000 mov rcx,qword ptr [rsp+0A0h]
kd>
KERNELBASE!DeviceIoControl+0xa7:
0033:00007ff8`24eeb0f7 4885c9 test rcx,rcx
kd>
KERNELBASE!DeviceIoControl+0xaa:
0033:00007ff8`24eeb0fa 7406 je KERNELBASE!DeviceIoControl+0xb2 (00007ff8`24eeb102)
kd>
KERNELBASE!DeviceIoControl+0xac:
0033:00007ff8`24eeb0fc 8b442458 mov eax,dword ptr [rsp+58h]
kd>
KERNELBASE!DeviceIoControl+0xb0:
0033:00007ff8`24eeb100 8901 mov dword ptr [rcx],eax
kd>
KERNELBASE!DeviceIoControl+0xb2:
0033:00007ff8`24eeb102 b801000000 mov eax,1
kd>
KERNELBASE!DeviceIoControl+0xb7:
0033:00007ff8`24eeb107 488b5c2470 mov rbx,qword ptr [rsp+70h]
kd>
KERNELBASE!DeviceIoControl+0xbc:
0033:00007ff8`24eeb10c 4883c460 add rsp,60h
kd>
KERNELBASE!DeviceIoControl+0xc0:
0033:00007ff8`24eeb110 5f pop rdi
kd>
KERNELBASE!DeviceIoControl+0xc1:
0033:00007ff8`24eeb111 c3 ret
kd>
KERNEL32!DeviceIoControlImplementation+0x81:
0033:00007ff8`267f5611 0f1f440000 nop dword ptr [rax+rax]
kd>
KERNEL32!DeviceIoControlImplementation+0x86:
0033:00007ff8`267f5616 488b5c2450 mov rbx,qword ptr [rsp+50h]
In my symbolized trace, I can see:
ntdll!NtDeviceIoControlFile+0x10
ntdll!NtDeviceIoControlFile+0x12
nt!KiSystemCall64+0x0
nt!KiSystemCall64+0x3
nt!KiSystemCall64+0xc
nt!KiSystemCall64+0x15
nt!KiSystemCall64+0x17
...
nt!IopSynchronousServiceTail+0x12d
nt!IopSynchronousServiceTail+0x137
nt!IopSynchronousServiceTail+0x13d
nt!IopSynchronousServiceTail+0x140
nt!IopSynchronousServiceTail+0x146
GetNameByOffset failed with hr=80004005
ioctl.bin.trace:1090: Symbolization of 0 failed ('0x'), skipping
So looks like the fuzzer makes an invalid deref inside syscall. I used a new state in this output, but the error looks identical.
Of course, no worries at all, thanks for giving the tool a shot :)
Hehe what you described seems like an interesting investigation - it doesn't seem to look like anything I've seen which is exciting. I'm happy to take it from there to try to figure out what happens.
Will try to have a look at this this weekend :)
Cheers
Thank you! Meanwhile I'll play with the different targets to see if there is any difference. I am not sure if I am not missing something stupid, since it was working for everyone else here. My setup is pretty ordinary, I just created a brand new VM in Hyper-V, then followed the instructions. Also, if you want me to upload the VM or something else, I am happy to assist you.
I made it working, still not sure what went wrong. I used a different windows image (with Microsoft Windows [Version 10.0.19044.1288]
, instead of the previous Microsoft Windows [Version 10.0.19044.2006]
) and enabled generation 2 when I created a new VM. Other than this, everything was the same.
Then I tried to take a dump twice, it had around 3.5GB (instead of the usual 1.5GB-1.8GB). I am happy that your fuzzer finally works (it's amazing and thanks for making it public), we can close the issue if you want. Thanks again for your help, I really appreciate it!
All right, I had some time to check out this issue after work. The TL'DR is that it seems like HEVD's PAGE
section is paged out on your dump.
Here is the story of the longer version; maybe it's also useful for others that want to be able to debug issues they encounter w/ wtf. Put on your seat belt, let's go!
First, I downloaded the provided mem.dmp
and regs.json
, I was able to reproduce the issue by launching:
> cd 33
> echo A > foo.bin
> wtf.exe run --name hevd --state=. --input foo.bin --backend=bxcpu`
...
Translation of GVA 0xfffff80711b789c0 failed
Okay, cool. One thing that is important to understand is that message is actually printed by wtf
itself, not bxcpu or any other component. wtf
has logic to be able to interact with virtual and physical memory (VirtRead*
/ VirtWrite*
/ PhysRead*
/ PhysWrite*
).
For this to work wtf
also needs to know how to parse page tables and do virtual to physical translation. The error above tells us that wtf
wasn't able to translate 0xfffff80711b789c0
to a physical address.
Why? We'll let's find out. The first thing to check in those cases is to load up the mem.dmp
in Windbg and try to read at this address to see if this is potentially a bug in wtf
or not:
> windbgx -z mem.dmp
...
kd> db 0xfffff80711b789c0
fffff807`11b789c0 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
fffff807`11b789d0 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
Okay so at this point, Windbg agrees w/ wtf which is good for wtf. The second logical step is to grab a RIP trace of the testcase to be able to follow what happens precisely, so let's do that:
> wtf.exe run --name hevd --state=. --input foo.bin --backend=bxcpu --trace-type=rip
The RIP trace is basically RIP addresses written into a file without any symbolization which is not super useful. To make it more useful, you need to symbolize it; to do that, I created symbolizer that does just that.
> symbolizer.exe --input foo.bin.trace --crash-dump mem.dmp > foo.bin.trace.symbolized.txt
Now, you can open foo.bin.trace.symbolized.txt
in a text editor and follow the flow along. It's always good to skim through the file first to kinda get an idea of where the flows goes. In that case, if you skip to the end you can see something weird:
nt!IopSynchronousServiceTail+0xf6
nt!IopSynchronousServiceTail+0x100
GetNameByOffset failed with hr=80004005
foo.bin.trace:1090: Symbolization of 0 failed ('0x'), skipping
[1 / 1] foo.bin.trace done
Completed symbolization of 1k addresses (1 failed) in 0s across 1 files.
As you can see, it seems an error was hit during symbolization of an address which is not expected. Because the symbolized file and the raw file are basically one address per line, it's easy to go to the same line number to see what's going on; here's what I found:
0xfffff8070be75843
0xfffff8070be75846
0xfffff8070be75850
0x
Okay makes sense. What probably happens is because wtf
hits an unrecoverable issue when doing a virt->phys translation it crashes itself which ends the process. The buffer of the trace file is not flushed and so some data is missing.
Cool, so let's just update BochscpuBackend_t::BeforeExecutionHook
's code to add a fflush
call:
#ifdef WINDOWS
__declspec(safebuffers)
#endif
void BochscpuBackend_t::BeforeExecutionHook(
/*void *Context, */ uint32_t, void *Ins) {
// ...
fmt::print(TraceFile_, "{:#x}\n", Rip);
fflush(TraceFile_);
Recompile, re-generate a trace, re-symbolize it and bingo we get a trace that doesn't stop abruptly now. Here is the end of the symbolized trace now:
HEVD+0x855c3
HEVD+0x855c5
HEVD+0x855cb
HEVD+0x855cd
HEVD+0x855d3
HEVD+0x855d5
HEVD+0x855d7
HEVD+0x855d9
HEVD+0x855db
HEVD+0x855e0
HEVD+0x855e7
HEVD+0x855ea
nt!DbgPrintEx+0x0
Okay so the hevd
fuzzing module sets a breakpoint on DbgPrintEx
that dumps the strings and basically nops those call:
if (!g_Backend->SetBreakpoint("nt!DbgPrintEx", [](Backend_t *Backend) {
const Gva_t FormatPtr = Backend->GetArgGva(2);
const std::string &Format = Backend->VirtReadString(FormatPtr);
DebugPrint("DbgPrintEx: {}", Format);
Backend->SimulateReturnFromFunction(0);
})) {
DebugPrint("Failed to SetBreakpoint DbgPrintEx\n");
return false;
}
What happens here is that the Backend->VirtReadString()
wants to read the format string passed to DbgPrintEx
but is failing to do, because it appears that the received virtual address doesn't exist in the page table.
Based on the code above, the argument that is a problem is the 3rd argument or the r8
register; let's check what the caller code looks like in Windbg:
kd> ub fffff80711b755f0
HEVD+0x855d3:
fffff807`11b755db ba03000000 mov edx,3
fffff807`11b755e0 4c8d05d9330000 lea r8,[HEVD+0x889c0 (fffff807`11b789c0)]
fffff807`11b755e7 8d4a4a lea ecx,[rdx+4Ah]
fffff807`11b755ea ff1518caf7ff call qword ptr [HEVD+0x2008 (fffff807`11af2008)]
kd> dqs fffff807`11af2008
fffff807`11af2008 fffff807`0bb7d1d0 nt!DbgPrintEx
Okay so we clearly see the call that will invoke DbgPrintEx
, and we can see how the r8
register with 0xfffff80711b789c0
which is the address that doesn't get translated.
Uh, interesting. Let's dump the PE header of HEVD
and check where this address is supposed to be pointing at:
kd> !address fffff807`11b78970
...
VA Type: BootLoaded
Module name: HEVD.sys
Module path: [\??\C:\Users\user\Downloads\hevd-unpacked\driver\vulnerable\x64\HEVD.sys]
kd> ? fffff807`11b78970 - hevd
Evaluate expression: 559472 = 00000000`00088970
kd> !dh -a hevd
...
SECTION HEADER #5
PAGE name
4BB5 virtual size
85000 virtual address
4C00 size of raw data
1A00 file pointer to raw data
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
(no align specified)
Execute Read
Okay so the pointer is actually pointing into the PAGE
section which is a section that can be paged out on disk as the name suggests. This is probably what happens here, the string pointer is paged out and the content is NOT in the dump so it crashes. On a real system, accessing this address would trigger a page fault handler that would page back in that memory from disk and it would carry on, but wtf
doesn't have any disk so that flow is not possible.
Anyways, we can also dig a bit deeper. If I looked at my own build of HEVD
I read how the IOCTL handler behaves when it receives the 0xdeadbeef
IOCTL code, and it ends-up executing the below code:
PAGE:00000001400855D7 mov edx, 3
PAGE:00000001400855DC lea r8, aInvalidIoctlCo ; "[-] Invalid IOCTL Code: 0x%X\n"
PAGE:00000001400855E3 lea ecx, [rdx+4Ah]
PAGE:00000001400855E6 call cs:__imp_DbgPrintEx
PAGE:00000001400855EC mov esi, 0C0000010h
That makes sense, the driver basically tells the user that they sent a busted code. It also sets an error code that we can also see in the code from the dump:
kd> u HEVD+0x855ea
HEVD+0x855ea:
fffff807`11b755ea ff1518caf7ff call qword ptr [HEVD+0x2008 (fffff807`11af2008)]
fffff807`11b755f0 be100000c0 mov esi,0C0000010h
Cool, everything checks out!
That class of issue is pretty annoying. It happens a LOT in user-mode and that's why I made lockmem. For kernel-mode I actually don't even know how to mimic that same behavior (Actually I've done some experiments on the fbl_kern branch but it hasn't pan out yet). Interestingly I haven't run into that issue on kernel yet - I wonder how much memory you had configured your VM with? I usually set mine at 4GB.
In theory, you can also disable the pagefile
(which is the disk file where data gets written to) but I haven't played much with this myself.
Anyways, thanks again for the actionable report!
Cheers
Thank you very much for a detailed writeup! :) I was using 4GB and since on a new VM the issue reoccurred, I ended up by disabling the pagefile
. This helped.
Is there the same issue with KVM? How can I take the snapshot on Linux? I know it's probably not supported yet, but what are the options?
Interesting! Something that might be confusing is that every backend is meant to run Windows code; KVM included. Dumping / running / fuzzing Linux code is not supported.
If you're really interested in that, there's been some work in #102 but it is experimental. I have personally not tried it out.
Cheers
I'll close this as the original issue was fixed!
Cheers