spillerrec / VirtualAA2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate performance issue

spillerrec opened this issue · comments

After fixing the obvious performance issues, VirtualAA2 still appears to be pretty slow. CPU usage is about 3 times a much as AA2 during IO, which is far from acceptable. It is however not saturating a core on my computer anymore, but AA2 still feels laggy like IO is having high latency for some reason.

Profiling with XLCode shows that the main performance issue in the code is the encryption code. Which is a bit strange since it is just XORing, and AA2 have to do that as well. However profiling with VTune says NtDeviceIoControlFile() is using over 90 % of the CPU Time, which appears to be called from Dokan itself and not from VirtualAA2. (The decrypt code spends a much time as fread.) So the performance issue might be an issue in Dokan.

A nice test would be use a stock installation which haven't split up the PP archives and only uses PassthroughFile. Here very little performance overhead should be due to VirtualAA2.

If it appears to be caused by dokan, try to ask here:
https://groups.google.com/forum/#!forum/dokan

Also this Dokan issue references both DeviceIoControl() (which calls NtDeviceIoControlFile()) and performance: dokan-dev/dokany#210
It is way over my head, so I don't know if it is relevant, but it is being worked on so might as well see if it makes a difference once it is included in a release.

Tested with an untouched installation and the problem is still there, with NtDeviceIoControlFile() using the majority of the time. Except for fread(), the top 10 hottest functions are not called by VirtualAA2.

By default Dokan creates 5 "message pump" IO threads and limits the total to 15. Each of these threads will call DeviceIoControl() to dequeue a kernel message for processing which does not return until the message is both processed by the user-mode driver and a completion notification is sent back to the kernel via another call to DeviceIoControl(). My guess is that you're seeing extremely poor throughput due to a limited number of threads that are constantly waiting on DeviceIoControl() to talk to the kernel. Issue dokan-dev/dokany#210 and PR dokan-dev/dokany#307 address this problem however it won't get merged into the codebase until 1.0.0.0 is released and a few outstanding issues with Mirror are fixed.

Thanks for your clarification, I will put it on a hold until I can try out a pre-release (or perhaps try to build it myself once it is merged to master). Thanks for all your work you put into solving this problem.