BenjaminKim / dokanx

user-mode filesystem framework for Windows

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BSODs after DokanMounter service restart

don-Pardon opened this issue · comments

Hi everyone.
Here is the thing (on Win8.1 Pro, x64, latest available dokanx build):

  1. Install dokanx.
  2. Run "mirrorfs /t 1 /r c: /l z" (I tested on 1-thread fs-app, and BSOD logs where collected for 1-thread app, but I had also multi-threaded mirrorfs caused BSOD as well).
  3. Restart DokanMounter service.
  4. Kill mirrorfs.
    After some time a BSOD will occur. If it didn't, run mirrorfs again.

I tried to detect what the hell happened several times and BSODs where different. Here are BSODs full descriptions (I didn't figure how to attach a txt file here, so external link, sorry). Here is the list of most "popular" BSODs:

  • UNEXPECTED_KERNEL_MODE_TRAP
  • SYSTEM_SERVICE_EXCEPTION
  • PAGE_FAULT_IN_NONPAGED_AREA
  • BAD_POOL_CALLER
  • UNEXPECTED_KERNEL_MODE_TRAP

Most of them point out to the memory problem, attempt to delete what was already deleted or attempt get access to what it shouldn't, etc., yet no one points onto dokanx directly.

Frankly speaking, I'm not that good at driver development neither at BSOD analyze, so I tried to analyze what led to such behavior. And here is what I found out:

  1. When mirrorfs (or any other fs-app) calls DokanMain function, dokanx_mount.exe receives a DOKAN_CONTROL_MOUNT event, it inserts mount entry into it's entry list (mouter.cpp):
if (DokanControlMount(Control->MountPoint, Control->DeviceName)) {
    Control->Status = DOKAN_CONTROL_SUCCESS;
    InsertMountEntry(Control);
} else {
    Control->Status = DOKAN_CONTROL_FAIL;
}

And when fs-app detaches, dokanx_mount.exe receives DOKAN_CONTROL_UNMOUNT event, tries to find mount record and umount mount point:

mountEntry = FindMountEntry(Control);
if (mountEntry == NULL) {
    if (Control->Option == DOKAN_CONTROL_OPTION_FORCE_UNMOUNT &&
        DokanControlUnmount(Control->MountPoint)) {
            Control->Status = DOKAN_CONTROL_SUCCESS;
            break;
    }
    Control->Status = DOKAN_CONTROL_FAIL;
    break;
}
if (DokanControlUnmount(mountEntry->MountControl.MountPoint)) {
    Control->Status = DOKAN_CONTROL_SUCCESS;
    if (wcslen(Control->DeviceName) == 0) {
        wcscpy_s(Control->DeviceName, sizeof(Control->DeviceName) / sizeof(WCHAR),
            mountEntry->MountControl.DeviceName);
    }
    RemoveMountEntry(mountEntry);
} else {
    mountEntry->MountControl.Status = DOKAN_CONTROL_FAIL;
    Control->Status = DOKAN_CONTROL_FAIL;
}

In normal case the record is found and DokanControlUnmount is called and DefineDosDevice(DDD_REMOVE_DEFINITION, drive, NULL) is executed. And previously mounted volume is detached. Everyone's happy. In normal case. But when DokanMounter service is being restarted it looses mount entry list and when DOKAN_CONTROL_UNMOUNT event comes, dokanx_mount.exe doesn't know what to do with such mount entry (unless DOKAN_CONTROL_OPTION_FORCE_UNMOUNT option is specified, but it can be specified only if unmount was performed via dokanx_control utility).
The point is that after DokanMounter service is being restarted, unmounting fails (unless force unmount) and after some time something memory-incorrect happens in kernel and BSOD occurs.

Again, I have very few knowledge about drivers and it would take lots of time for me to discover what and why happening in kernel mode, so if anyone willing to dig up the truth - you are welcome.
For now I have some solution-like-thoughts :

  1. Why do we need DOKAN_CONTROL_OPTION_FORCE_UNMOUNT? If we remove check for that option and will try to unmount everything that being requested, what problems will it cause? Something like this:
mountEntry = FindMountEntry(Control);
if (mountEntry == NULL) {
    if (DokanControlUnmount(Control->MountPoint)) {
            Control->Status = DOKAN_CONTROL_SUCCESS;
            break;
    }
    Control->Status = DOKAN_CONTROL_FAIL;
    break;
}
if (DokanControlUnmount(mountEntry->MountControl.MountPoint)) {
    Control->Status = DOKAN_CONTROL_SUCCESS;
    if (wcslen(Control->DeviceName) == 0) {
        wcscpy_s(Control->DeviceName, sizeof(Control->DeviceName) / sizeof(WCHAR),
            mountEntry->MountControl.DeviceName);
    }
    RemoveMountEntry(mountEntry);
} else {
    mountEntry->MountControl.Status = DOKAN_CONTROL_FAIL;
    Control->Status = DOKAN_CONTROL_FAIL;
}

I tried this approach and everything went fine (no BSODs) except for BOOL SendReleaseIRP(LPCWSTR DeviceName) function fails because of GetRawDeviceName fails, because there were no entry list and therefore required entry wasn't found. The device wasn't release as I understood, how bad is that?
2. Another approach - is to create some kind of mount entry list serialization/deserialization for DokanMounter service, so when it is being shut down, it would dump this list to somewhere and when it comes back alive it would know all about mount entries. This approach is kinda row yet, it should take into account situation when fs-app detaches when service is still down, or when new fs-app being attached. Maybe combine those two approaches...

That is it. Thanks for reading. I would like you guys to talk about these matter. Please fill free to correct me, give advices, any kind of thoughts. For now I'll use first approach, but I thinks it is not fully correct and I'm not going to pull it yet.

Your BSOD

According to the BSOD the vpb is no longer mounted and the IO Manager does the mount again.

nt!ObQueryNameString+0xe
fltmgr!FltpGetObjectName+0x30
fltmgr!FltpFsControlMountVolume+0xa2
fltmgr!FltpFsControl+0x131
nt!IopMountVolume+0x261
nt!IopCheckVpbMounted+0x146

By the way here is a break missed after this line.

status = STATUS_INVALID_PARAMETER;

Because your usermode application crached, the vpb seems no longer to be valid. The IO Manager does the request ObQueryNameStringMode and because of invalid vpb the bluescreen happens.

This issue exists since many years and nobody tried to solve it.

Unmount problem generally - solution

When the crash of your usermode application happens and the unmount was not possible then the drive letter still remains in explorer. Dokan should be extended so that the DokanOptions are passed through to the mounter. The ForceUnmount should become a new option of DokanOptions. If the Dokan Mounter Service is not able to find the entry inside of the memory and the flag forceunmount is set, it can take the passed drive letter and check the accociated Device. If it's a dokan device, it can take the device and do the unmount properly including the SendReleaseIRP.

SendReleaseIRP

This send is part of unmount and it's very important. In your situation because of the forceunmount option it may work properly, because the drive letter remains and the Control Device Object is maybe removed. If the Control Device Object in your actual situation is still not removed and a new one is created, the system will still remain unstable if somebody tries to access the old device object, because of the issue described in the BSOD analyze above.

Next steps

The bluescreen should be analyzed and even if the usermode application crashes, it should not cause a bluescreen. As soon as I have more time, will start to analyze it. Hope that also other guys are interested in solving the problem.

Thanks for the reply and for describing what SendReleaseIRP is needed for.
I'm not fully agree with ForceUnmount option - if user-mode-fs doesn't turn this option on, the system will BSOD eventually - I think this approach should be implemented without optioning.
The further I think about mount entry list, the more I doubt the idea of storing it in mount service. Why don't store it in the same process that mounts/unmounts and releases IRP (e.g. dokanx.dll)?
BTW, there is the same discussion on the neighbor fork: dokan-dev/dokany#26

How do you know that for example drive Z belongs to you and not to another user running on a terminal server? If you put this option in as default, you will remove the drive letter Z of another user. Only the release manager has the knowledge how his application should behave on the target environment and therefore it must be an option.

Only the mount service is running in system context. Your application crashed, so you don't have the mount list, like the restarted mount service. There is no difference between your application and the service, except that the mounter must run using system context. There is one component which has the list still in memory, it’s the driver.

Whatever happens, you should never get a BSOD. So please download this driver and perform the test again and let me know if all works as expected.