Gbps / gbhv

Simple x86-64 VT-x Hypervisor with EPT Hooking

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HvEptAddPageHook failing

Seegee opened this issue · comments

This is an issue that I've experienced every time I load the hypervisor. I'm using 'dsefix' to disable driver signature enforcement, and using the Windows 'sc' service manager to start the driver. @autisticlittleguy mentioned that he had the same problem in issue #2 .

I've added some more debug prints and found out that the failure in HvEptGetPml1Entry lies here:

if(!PML1) { HvUtilLogError("HvEptGetPml1Entry: !PML1\n"); return NULL; }

This seems to indicate that the OsPhysicalToVirtual function is failing for some reason. I haven't made any other modifications to the source besides adding debug prints.

commented

Do you know if this is a problem only on 2004? OsPhysicalToVirtual uses an internal Windows memory manager function, which unfortunately doesn't guarantee any kind of portability between versions. If a major change happened to the function such that it fails in new situations, that might be the source of the problem.

I only have a test environment setup for 2004, so I am unsure. I assumed it had something to do with the Windows version, because it seems like the hypervisor has been extensively tested on other versions and works fine.

I've printed PML2Pointer->PageFrameNumber, and that seems to be valid each time, so the issue is in the call to MmGetVirtualForPhysical itself. I've done a bit of research, and I don't see any obvious changes to this function in Windows 2004...

commented

Right now, I'm not even using a VM, just testing on my main Windows install. If I get time later, I'll setup an 1809 VM and try it out.

@autisticlittleguy are you experiencing this same issue on Windows 2004, or another version?

commented

What is the address of the hook target passed to HvEptAddPageHook?

[DEBUG] TargetFunction fffff8012dc1d2f0 HookFunction fffff8018c0c2060 OrigFunction fffff8018c0ca030 PhysicalAddress from OsVirtualToPhysical: 2e1d000

The last address is the one returned from this call:

PhysicalAddress = (SIZE_T) OsVirtualToPhysical(VirtualTarget);
	
HvUtilLogDebug("TargetFunction %llx HookFunction %llx OrigFunction %llx PhysicalAddress from OsVirtualToPhysical: %llx \n", TargetFunction, HookFunction, OrigFunction, PhysicalAddress);

commented

That seems to be in order as well. I wonder if this is a new security protection that was slid in.

@Gbps @Seegee

My apologies for answering in an untimely manner, i lost my other github account due to using a temp-mail. That being said:

I'm experiencing the same issue on both vmware and a physical machine. The issue seems to present itself randomly, the hook can work on some processors and completely fail on others. The check it fails SEEMS to ALWAYS be the one that happens after OsPhysicalToVirtual is called. The PageFrameNumber on the PML2Pointer is not null, neither is the PML2Pointer, so that makes me think it could either be a wrong PageFrameNumber, or the OsPhysicalToVirtual function failing. Sometimes though, it all works smoothly.

Windows version: 2004.

My prints atm:


        /* Translate to the PML1 pointer */
	PML1 = (PEPT_PML1_ENTRY) OsPhysicalToVirtual((PVOID)(PML2Pointer->PageFrameNumber * PAGE_SIZE));

	HvUtilLogDebug("PML2Pointer->PageFrameNumber = %p\n", PML2Pointer->PageFrameNumber);

	if(!PML1)
	{
		HvUtilLogDebug("Couldn't translate physical memory %p to virtual memory %p, aborting...\n", PhysicalAddress, PML1);
		return NULL;
	}

	/* Index into PML1 for that address */
	PML1 = &PML1[ADDRMASK_EPT_PML1_INDEX(PhysicalAddress)];

	HvUtilLogDebug("Translated PML1 for physical address %p: %p\n", PhysicalAddress, PML1);

commented

Hmm... also Windows version 2004.

@autisticlittleboy for me, it happens 100% of the time on all processors. I'm never able to get the hook to properly initialize.

commented

I doubt the PageFrameNumber is incorrect, so the only real unknown is the translation back to virtual failing. That, or the translation to physical was incorrect to begin with.

@Seegee Hmm. I've only tested hooking syscalls so far, try hooking something that's not a syscall, maybe some random function in ntoskrnl and see if it's still failing with the OsPhysicalToVirtual check, if not, maybe the issue is within the EPT configuration. Just throwing some thoughts here since this issue is way weirder than others i encountered.

commented

I wonder if this is related to some of the additional protections placed on MmMapIoSpace where it forbid you from mapping certain areas of physical memory from the API level. I don't remember when this happened, I think during the 19xx era. It could potentially be that in 2004, there was an edit that affected the ability to translate these mappings freely.

Looking at an old article from Sina Karvandi (https://rayanfam.com/topics/inside-windows-page-frame-number-part2/), it looks like the MiGetPhysicalAddress function has, somewhat, changed. I've never done address translations manually but if the OS is trying to prevent us from getting the physical address this may be a way to get around it.

commented

Well that was back in 2018, which I never encountered this issue or anyone else around that time. Technically, system calls do lie in paged memory, which could potentially be paged out at the time of hook. A call to MmProbeAndLockPages on the target virtual address would at least lock them in, which currently the code base does not do.

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmprobeandlockpages

It would be very curious, though, to ever see NtCreateFile paged out. This however might be different if you guys are trying to hook a different systemcall that is less used. I guess it's worth a shot to try to lock them prior, this would match up with the seemingly randomness of @autisticlittleboy 's experience

I've seen hyperplatform and its sub projects use MmGetVirtualForPhysical and MmGetPhysicalAddress, so i doubt these functions are at fault here (given that both work just fine on my machine and in a VM). I'll try locking the pages in to see what happens.

@Seegee

I think you were right @Gbps
It worked for 4 consecutive system restarts after locking the page for read access:

PMDL NtCreateFileMDL = IoAllocateMdl(PAGE_ALIGN(NtCreateFile), PAGE_SIZE, FALSE, TRUE, NULL);
MmProbeAndLockPages(NtCreateFileMDL, KernelMode, IoReadAccess);
HvEptAddPageHook(ProcessorContext, (PVOID)NtCreateFile, (PVOID)NtCreateFileHook, (PVOID*)&NtCreateFileOriginal);
MmUnlockPages(NtCreateFileMDL);

Of course if you use a DSE bypass technique that maps your driver you won't have SEH and you'll error out horribly if an exception is thrown, but that's fine by me in my project. I do not want anything to run if hooks fail. Also i mean, we do have a subverted processor, either kill patchguard and get SEH by adding your invalid mapped region to the tables or get creative :)

It might be a good idea to lock the pages beforehand and only unlock them once the hypervisor is done placing its hooks and virtualizing the system, i'm still restarting and virtualizing the system over and over to check whether it's really fixed or not.

It looks like it failed on the 8th restart for one of the processors:

[*] HvInitializeAllProcessors: Starting.
[*] Total Processor Count: 2
[DEBUG] EPT PML4Entry is 0000000000000000
[DEBUG] EPT PML2 is FFFFB378CA8020B8 PML2->LargePage = 0000000000000000
[DEBUG] PML2Pointer->PageFrameNumber = 0000000000007B4C
[DEBUG] Couldn't translate physical memory 0000000002E1D000 to virtual memory 0000000000000000, aborting...
[!] HvEptAddPageHook: Failed to get PML1 entry for target address.
[*] HvInitializeLogicalProcessor[#0]: Allocated Context [Context = 0xffffe28fbd94c000]
[DEBUG] EPT PML4Entry is 0000000000000000
[DEBUG] EPT PML2 is FFFFB378CAC020B8 PML2->LargePage = 0000000000000000
[DEBUG] PML2Pointer->PageFrameNumber = 0000000000050039
[DEBUG] Translated PML1 for physical address 0000000002E1D000: FFFFE28FBDBFE0E8
[DEBUG] OffsetIntoPage: 0x2f0
[DEBUG] Number of bytes of instruction mem: 14
[DEBUG] Trampoline: 0xffffe28fb7902a90
[DEBUG] HookFunction: 0xfffff80644773140
[*] HvInitializeLogicalProcessor[#1]: Allocated Context [Context = 0xffffe28fbda6b000]
[*] VmcsRevisionNumber: 1
[+] HvEptCheckFeatures: All EPT features present.
[DEBUG] EPT: Number of dynamic ranges: 8
[DEBUG] MTRR Range: Base=0x0 End=0xFFFFFFFFF Type=0x6
[DEBUG] MTRR Range: Base=0xC0000000 End=0xFFFFFFFF Type=0x0
[DEBUG] Total MTRR Ranges Committed: 1
[DEBUG] VmxOnRegion[#0]: (V) 0xffffa38080c33000 / (P) 0x7ff5e000 [1]
[DEBUG] VmxOnRegion[#1]: (V) 0xffffa38080c71000 / (P) 0x7fe7e000 [1]
[DEBUG] HvSetupVmcsControlFields: VmError = 0
[DEBUG] HvSetupVmcsControlFields: VmError = 0
[DEBUG] GdtRegister: 0xfffff60c60a37940, Base: 0xffffa380807eafb0, Limit: 0x57
[DEBUG] GdtRegister: 0xfffff60c629ef030, Base: 0xfffff80642264fb0, Limit: 0x57
[DEBUG] VmxLaunchProcessor: VMLAUNCH....
[DEBUG] VmxLaunchProcessor: VMLAUNCH....
[+] HvInitializeAllProcessors: Success.

Nope issue is not related to IRQL.

@Seegee

I think you were right @Gbps
It worked for 4 consecutive system restarts after locking the page for read access:

PMDL NtCreateFileMDL = IoAllocateMdl(PAGE_ALIGN(NtCreateFile), PAGE_SIZE, FALSE, TRUE, NULL);
MmProbeAndLockPages(NtCreateFileMDL, KernelMode, IoReadAccess);
HvEptAddPageHook(ProcessorContext, (PVOID)NtCreateFile, (PVOID)NtCreateFileHook, (PVOID*)&NtCreateFileOriginal);
MmUnlockPages(NtCreateFileMDL);

Of course if you use a DSE bypass technique that maps your driver you won't have SEH and you'll error out horribly if an exception is thrown, but that's fine by me in my project. I do not want anything to run if hooks fail. Also i mean, we do have a subverted processor, either kill patchguard and get SEH by adding your invalid mapped region to the tables or get creative :)

Still fails for me on all 12 processors when locking the page like this.

commented

I appreciate the testing you all are doing. I wish I had a better explanation. I'll continue to try to trace this down with you.

commented

Thinking more about it, it's possible this is related to the page splitting code in HvEptSplitLargePage, since that code is responsible for generating the PML2 pointer to the PML1 entries that are retrieved immediately after by the function in question. In theory, unless you hooked two functions which share a PML2 layer, the HvEptSplitLargePage should always split a large page, and it should always match the same PML2 as the code which retrieves the PML1 (the buggy code in question).

Specifically, the value of PageFrameNumber in the PML1 translation function should always be the same as the one set here:
https://github.com/Gbps/gbhv/blob/master/gbhv/ept.c#L421

Which would end up being wrong if:

(SIZE_T)OsVirtualToPhysical(&NewSplit->PML1[0])

Returned the wrong value. But again, never seen this return the wrong value here.

Worth tracing output to ensure that NewPointer.PageFrameNumber (SplitPage) matches PML2Pointer->PageFrameNumber (GetPML1) and NewPointer is the same virtual address as PML2Pointer.

Traced some more calls. Unsure how to interpret the result here. I need to really give a good read to the manual again. Output comes from a dual core VM.


[DEBUG] Page split! NewPointer = 000000006ADF9007
[DEBUG] Page split! NewPointer.PageFrameNumber= 000000000006ADF9
[DEBUG] EPT PML4Entry is 0000000000000000
[DEBUG] EPT PML2Pointer is FFFFBC7DB52020B8 PML2->LargePage = 0000000000000000
[DEBUG] PML2Pointer->PageFrameNumber = 000000000006ADF9
[DEBUG] Couldn't translate physical memory 0000000002E1D000 to virtual memory 0000000000000000, aborting...
[!] HvEptAddPageHook: Failed to get PML1 entry for target address.
[*] HvInitializeLogicalProcessor[#0]: Allocated Context [Context = 0xffffe30534725000]

[DEBUG] Page split! NewPointer = 0000000024BF5007
[DEBUG] Page split! NewPointer.PageFrameNumber= 0000000000024BF5
[DEBUG] EPT PML4Entry is 0000000000000000
[DEBUG] EPT PML2Pointer is FFFFBC7DB56020B8 PML2->LargePage = 0000000000000000
[DEBUG] PML2Pointer->PageFrameNumber = 0000000000024BF5
[DEBUG] Translated PML1 for physical address 0000000002E1D000 in virtual is: FFFFE3053365A0E8
[DEBUG] OffsetIntoPage: 0x2f0
[DEBUG] Number of bytes of instruction mem: 14
[DEBUG] Trampoline: 0xffffe3052d962a90
[DEBUG] HookFunction: 0xfffff80428eb3120
[*] HvInitializeLogicalProcessor[#1]: Allocated Context [Context = 0xffffe30534959000]

[*] VmcsRevisionNumber: 1
[+] HvEptCheckFeatures: All EPT features present.
[DEBUG] EPT: Number of dynamic ranges: 8
[DEBUG] MTRR Range: Base=0x0 End=0xFFFFFFFFF Type=0x6
[DEBUG] MTRR Range: Base=0xC0000000 End=0xFFFFFFFF Type=0x0
[DEBUG] Total MTRR Ranges Committed: 1
[DEBUG] VmxOnRegion[#0]: (V) 0xffffac8013cbe000 / (P) 0x7dbb5000 [1]
[DEBUG] VmxOnRegion[#1]: (V) 0xffffac8013d11000 / (P) 0x7dbac000 [1]
[DEBUG] HvSetupVmcsControlFields: VmError = 0
[DEBUG] GdtRegister: 0xffff8108147c2030, Base: 0xfffff80426664fb0, Limit: 0x57
[DEBUG] HvSetupVmcsControlFields: VmError = 0
[DEBUG] GdtRegister: 0xffff810812637940, Base: 0xffffac80137eafb0, Limit: 0x57
[DEBUG] VmxLaunchProcessor: VMLAUNCH....
[DEBUG] VmxLaunchProcessor: VMLAUNCH....
[+] HvInitializeAllProcessors: Success.
[DEBUG] EPT Violation => 0x2E1D2F0
[+] Made Exec
commented

Thanks again, that's very helpful.

Can you give me the value of &NewSplit->PML1[0] and the value of OsVirtualToPhysical(&NewSplit->PML1[0]) at this line for a successful and unsuccessful hook:
https://github.com/Gbps/gbhv/blob/master/gbhv/ept.c#L421

Hopefully that will tell the story.

Alright here's the logs with the PML1 added (dual core vm, Windows v2004, one core fails one succeeds):

[*] HvInitializeAllProcessors: Starting.
[*] Total Processor Count: 2
[DEBUG] Page split! NewPointer = 0000000023366007
[DEBUG] Page split! NewPointer.PageFrameNumber= 0000000000023366
[DEBUG] Page split! &NewSplit->PML1[0] = FFFFAE8F2C966000
[DEBUG] Page split! (SIZE_T)OsVirtualToPhysical(&NewSplit->PML1[0]) = 0000000023366000
[DEBUG] EPT PML4Entry is 0000000000000000
[DEBUG] EPT PML2Pointer is FFFFD87FB58020B8 PML2->LargePage = 0000000000000000
[DEBUG] PML2Pointer->PageFrameNumber = 0000000000023366
[DEBUG] Couldn't translate physical memory 0000000002E1D000 to virtual memory 0000000000000000, aborting...
[!] HvEptAddPageHook: Failed to get PML1 entry for target address.
[*] HvInitializeLogicalProcessor[#0]: Allocated Context [Context = 0xffffae8f2e2b2000]

[DEBUG] Page split! NewPointer = 0000000042751007
[DEBUG] Page split! NewPointer.PageFrameNumber= 0000000000042751
[DEBUG] Page split! &NewSplit->PML1[0] = FFFFAE8F2E6BF000
[DEBUG] Page split! (SIZE_T)OsVirtualToPhysical(&NewSplit->PML1[0]) = 0000000042751000
[DEBUG] EPT PML4Entry is 0000000000000000
[DEBUG] EPT PML2Pointer is FFFFD87FB5C020B8 PML2->LargePage = 0000000000000000
[DEBUG] PML2Pointer->PageFrameNumber = 0000000000042751
[DEBUG] Translated PML1 for physical address 0000000002E1D000 in virtual is: FFFFAE8F2E6BF0E8
[DEBUG] OffsetIntoPage: 0x2f0
[DEBUG] Number of bytes of instruction mem: 14
[DEBUG] Trampoline: 0xffffae8f273e5a90
[DEBUG] HookFunction: 0xfffff8020fff3170
[*] HvInitializeLogicalProcessor[#1]: Allocated Context [Context = 0xffffae8f2e26f000]

[*] VmcsRevisionNumber: 1
[+] HvEptCheckFeatures: All EPT features present.
[DEBUG] EPT: Number of dynamic ranges: 8
[DEBUG] MTRR Range: Base=0x0 End=0xFFFFFFFFF Type=0x6
[DEBUG] MTRR Range: Base=0xC0000000 End=0xFFFFFFFF Type=0x0
[DEBUG] Total MTRR Ranges Committed: 1
[DEBUG] VmxOnRegion[#0]: (V) 0xffffc88050c27000 / (P) 0x7dc4d000 [1]
[DEBUG] VmxOnRegion[#1]: (V) 0xffffc88050c3c000 / (P) 0x7db2b000 [1]
[DEBUG] HvSetupVmcsControlFields: VmError = 0
[DEBUG] HvSetupVmcsControlFields: VmError = 0
[DEBUG] GdtRegister: 0xffffac888c82efb0, Base: 0xffffc880507eafb0, Limit: 0x57
[DEBUG] GdtRegister: 0xffffac888cbe7030, Base: 0xfffff8020d864fb0, Limit: 0x57
[DEBUG] VmxLaunchProcessor: VMLAUNCH....
[DEBUG] VmxLaunchProcessor: VMLAUNCH....
[+] HvInitializeAllProcessors: Success.
[DEBUG] EPT Violation => 0x2E1D2F0
[+] Made Exec

Logging code for reference (OsVirtualToPhysical wasn't printed as it's used (You split it by PAGE_SIZE)):

NewPointer.Flags = 0;
NewPointer.WriteAccess = 1;
NewPointer.ReadAccess = 1;
NewPointer.ExecuteAccess = 1;
NewPointer.PageFrameNumber = (SIZE_T)OsVirtualToPhysical(&NewSplit->PML1[0]) / PAGE_SIZE;

HvUtilLogDebug("Page split! NewPointer = %p\n", NewPointer);
HvUtilLogDebug("Page split! NewPointer.PageFrameNumber= %p\n", NewPointer.PageFrameNumber);
HvUtilLogDebug("Page split! &NewSplit->PML1[0] = %p\n", &NewSplit->PML1[0]);
HvUtilLogDebug("Page split! (SIZE_T)OsVirtualToPhysical(&NewSplit->PML1[0]) = %p\n" (SIZE_T)OsVirtualToPhysical(&NewSplit->PML1[0]));
commented

Well, those values also look completely normal and expected! It just absolutely seems like OsPhysicalToVirtual is just straight up failing. Which makes no sense, because it works in one processor and not in another processor. Even the memory that's being converted is just regular kernel pool memory, nothing special about it at all.

I'll keep brainstorming...

This might be a bit of a stupid idea... but considering you said that the values look completely normal and expected, and that this is just an API problem, why not skip over using the API at all? Would adding a third parameter to HvEptGetPml1Entry that simply contains the expected virtual address work? We can proceed with the regular calculation via API, if the API returns zero, we use what was passed in the parameter and hope for the best. Janky, stupid solution that definitely shouldn't be used on production software but it could work, since the other processor suceeds just fine. This is definitely one of the dumbest solutions i came up with but i'll try it.

commented

Without figuring out the root cause of the issue, it will cause further bugs later down the line. I'm still looking into why it's happening right now.

For consistencies sake, I tested this on a Windows 10 1909 vmware VM, and the hook worked fine always. I then tested on a Windows 10 2004 VM, and it failed, as it did on my desktop.

commented

I'm not sure how relevant this is but https://github.com/hyperdbg/hyperdbg uses the same method to gather the PML1Entry. I can't get that vmm to run since it's giving weird issues where vmware simply freezes but maybe you can get it to run @Seegee . If you do, try placing ept hooks with !epthook, documentation to do it should be in the repo. If they work in a project that uses such a similar method to grab the PML1Entry we might be able to get some clues as to why it's not working in this project. If it doesn't work there either, well, that'll be interesting. I'll try getting hyperdbg to work again tomorrow. "load vmm" simply freezes my whole machine rn.

commented

I think I've located the problem, as I got it to repro on my machine... will update with more information if I'm able to fix it.

commented

@autisticlittleboy @Seegee
I've created a new branch which I believe resolves the issue:

https://github.com/Gbps/gbhv/tree/win10-2004

You can either compile it from source, or use the binary under release/. Please let me know if this resolves this bug for you so I can go ahead and merge this into master. As for root cause, below is the comment I added to ept.c talking about it:

	/*
	* Allocate the PML1 entries for the split.
	* NOTE: This would *not* need to use contiguous aligned pages normally, except for a bug which is experienced
	* in Windows 10 v2004 where changes to the nonpaged pool allocator resulted in some page aligned allocations
	* being mapped as 4MB large pages rather than the expected 4KB pages. This causes the following VtoP and PtoV
	* functions to fail, because the Mm APIs are not able to properly translate physical addresses within a large page
	* back to its virtual address due to a null PTE pointer inside the PFN database entry for the large page.
	* 
	* From my testing, I was unable to find a way to coerce Mm to split a nonpaged pool large page for me, so the best
	* alternative was to use the contiguous aligned pages allocator because, in my testing, it resulted in only 4KB virtual
	* allocations. This allocator also utilizes nonpaged pool frames, so it is more-or-less the same as the other allocator.
	*/

Worked perfectly on windows v2004 on a laptop that i previously wouldn't have been able to place any hooks on. I'll keep it running overnight to see if there's any stability issues and i'll try loading the hypervisor several times to see if the issue is still present. Thank you very much for taking the time to look into this. @Gbps

commented
commented

Closing this as it was fixed in #19 .