HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CPU optimization exploration tracker

HansKristian-Work opened this issue · comments

The idea of this issue is to explore CPU optimizations in vkd3d-proton. For a game to be considered here it should be CPU bound with significant API overhead, i.e., we can meaningfully improve game performance through perf tuning on our end.

Information needed:

  • Game name (AppID)
    • How to reproduce CPU heavy scene -> minimum effort required to get to it from a fresh game. Screenshots are helpful.

Monster Hunter: Rise (1446780)

Details TBD

Monster Hunter: Rise saw large improvements recently with the descriptor copy optimizations (90 -> 100 fps) and there are even more gains with the descriptor punchthrough path (100 -> 110 fps). However, it is still cpu limited. The area that's most cpu limited it right after you start the first hunt when looking into the distance:

Screenshot

Screenshot from 2022-03-04 14-50-52

To get to it, start a new game and mash through the tutorial in the village until the quest-giver allows you to take the first hunt.

DEATH STRANDING is also a good candidate to look into, it's usually cpu-bound with descriptor copies high in perf top, especially in later areas, but even right after starting the game, especially at lower resolutions:

Screenshot

Screenshot from 2022-03-04 15-03-36

(Savegames for this game are not share-able and it takes forever to reach later, more cpu-bound, areas.)

Control (870780)

Almost everywhere. I can first see it in the first scene with janitor. But it's almost never go below 50 fps

Screenshots

control3
control1
control2

What exactly is the issue with Control? CPU limit doesn't necessarily mean that our code runs into an obvious bottleneck and performance should generally be good in that game, assuming reasonable hardware and a non-borked wine configuration.

What exactly is the issue with Control? CPU limit doesn't necessarily mean that our code runs into an obvious bottleneck and performance should generally be good in that game, assuming reasonable hardware and a non-borked wine configuration.

I did some tests on windows and there is huge difference with vkd3d-proton in some scenes. On windows even with low configuration and render resolution 960x540 performance always was limited by gpu. On linux with same config 2 times lower fps and 45% gpu load

Windows dx12

control_dx12 2022-04-10 11-40-22

Proton dx12

control4

Proton dx11

control5

Try VKD3D_CONFIG=no_upload_hvv maybe. Differences that huge are normally not caused by optimization issues.

Also, please mention your hardware when complaining about performance...

Hardware info

VKD3D_CONFIG=no_upload_hvv has no visible effect on performance. My GPU (RX 590) have 8G VRAM.
I see a direct correlation between fps and CPU frequency in this scene.
With maximum frequency (3.3) - 79 fps (143 fps on windows with same max freq but default governor)
With frequency fixed at 3.0 - 71 fps
With frequency fixed at 2.5 - ~50 fps
And just for comparison dx11 version with frequency fixed at 1.2 - 120 fps with 100% GPU load (dx12 - 25 fps )
Tests were conducted with performance governor:
cpupower frequency-set --governor performance
cpupower frequency-set -f <freq>

Setting governor to default(schedutil) leads to low unstable fps from 40 to 53

It's definitely CPU bound and doesn't appear on windows or with dx11 version with proton.
I can check performance on Windows with limited CPU frequency if it helps.

UPDATE:
On windows minimum render resolution available 720p.
With balanced power settings (max freq 3.3) - 143 fps
With CPU frequency fixed at 1.2 - 65 fps

I have similar issues to @kermeat with Control; DXVK appears to give much better performance than VKD3D. Certain areas of the map seem to be CPU bound with VKD3D, dropping GPU usage down to 40 - 50% (FPS drops to 40-50 accordingly). I don't get this with DXVK.

Hardware Info

Both the below screenshots are captured using GE-Proton7-37, with exactly the same graphical options set, at native1440p.

Launch options for VKD3D: PROTON_ENABLE_NVAPI=1 VKD3D_CONFIG=dxr11 mangohud %command% -skipStartScreen -dx12
VKD3D

Launch options for DXVK: PROTON_ENABLE_NVAPI=1 mangohud %command% -skipStartScreen -dx11
DXVK

Spider-Man: Remastered (1817070)

Spider-Man: Miles Morales (1817190)

Those two seem to be by far the most CPU heavy games with vkd3d-proton, at least on Nvidia GPUs.

Test case

20221128232821_1

Sitting on a lantern in the middle of Times Square in Miles Morales.
Settings: Maxed out, including ray tracing. With the exception that the RT distance is kept at the middle setting which is the default. Resolution doesn't matter as it's CPU limited in all cases.

Results

  • 37 fps with VK_EXT_mutable_descriptor_type
  • 37 fps with VK_EXT_descriptor_buffers but vkGetDescriptorEXT has to go through the Wine syscall path
  • 47 fps with VK_EXT_descriptor_buffers but that function uses a direct call
  • 42 fps with vkd3d-proton master on Windows (with VK_EXT_mutable_descriptor_type)
  • 61 fps on Windows

Unfortunately Windows is 1.3x as fast as the fastest result I got on Linux.

VKD3D profiling result:
milesprofiling.txt

As text, sorted by ticks:
milesprofiling.txt