GPUOpen-Drivers / AMDVLK

AMD Open Source Driver For Vulkan

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[dxvk] Optimize for the d3d9 Strict float emulation path

Blisto91 opened this issue · comments

The d3d9 Strict float emulation path in dxvk (see links below for technical description) is not enabled by default for all drivers, even though it is more correct, because it has a performance penalty compared to the default True.
Radv and now also nvk have code to optimize for this and so will both use Strict out of the box without any performance penalty and with more games functioning out of the box without visual issues.

Amdvlk currently doesn't do this and so will either have a performance penalty for any games where dxvk sets Strict by default or risk of visual issues in any games where such builtin configs doesn't exist yet. A couple of examples for illustrating the performance dip can be seen below.
Note that these games are just randomly chosen and are not meant to be worst case scenarios. Also note that my test setup is pretty high end to begin with (RX6800 and 7950x) and so does not represent a typical one.

Risen

d3d9.floatEmulation = True
emulation-true

d3d9.floatEmulation = Strict
emulation-strict

Dragons Dogma

d3d9.floatEmulation = True
dragon-emulation-true

d3d9.floatEmulation = Strict
dragon-emulation-strict

See original dxvk PR doitsujin/dxvk#2294
See also radv MR https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436

@Blisto91 Thanks for your comments. Now I'm investigating this issue on Amdvlk.
A question here:
Do you have any SPIRV generated when "d3d9.floatEmulation = True" or "d3d9.floatEmulation = Strict" ? I want to know if this setting will be reflected in the shader. Then I can do the optimization according to the related flag in SPIRV.
Thanks.

Hi there and thank you for the response. I am not personally skilled in this area but I have asked the dxvk devs for assistance when they have a bit of time.

I see GPUOpen-Drivers/llpc@e91a935 added an optimization for ((b==0.0 ? 0.0 : a) * (a==0.0 ? 0.0 : b)). But dxvk also emits fma((b==0.0 ? 0.0 : a), (a==0.0 ? 0.0 : b), c) . So unless llpc lowers fma to mul+add, you should also add a pattern that optimizes the fma version to v_fma_legacy_f32/v_mad_legacy_f32. And depending on if you run constant folding before the optimizations, you also want to handle the case where the comparison+select was optimized away for one mul operand, (a * (a==0.0 ? 0.0 : b))/fma(a, (a==0.0 ? 0.0 : b), c), if b is not constant zero.

@DadSchoorse Thanks for your comment. Now I have added more patterns as you refer, now the patterns supported is listed below:

  1. ((b==0.0 ? 0.0 : a) * (a==0.0 ? 0.0 : b)) ==>fmul_legacy(a,b)
  2. a * (a==0.0?0.0:b) or (b==0.0?0.0:a) * b ==>fmul_legacy(a.b)
  3. fma((b==0.0 ? 0.0 : a), (a==0.0 ? 0.0 : b), c) ==>fma_legacy(a,b,c)
  4. fma(a, (a==0.0 ? 0.0 : b), c) or fma(b==0.0?0.0:a, b, c) ==>fma_legacy(a,b,c)

For 2.3, one more condition is the single operand(a or b) should not be constant zero here.

Please check any missing here. Now my fix is under CI, looking forward to merge and deliver it ASAP.
Thanks.

For 2.3, one more condition is the single operand(a or b) should not be constant zero here.

What I've said before may have been a bit ambiguous, so just to make sure: For a * (a==0.0?0.0:b) it's important that b is not zero. So if (b.isConstant() && b.constantValue() != 0.0) { apply_opt(); }, not if (!b.isConstant() || b.constantValue() != 0.0).

Otherwise, your list matches what radv optimizes.

Oh, another thing I just thought of, I don't see a bit size check in GPUOpen-Drivers/llpc@e91a935 . v_mul_legacy_f32/v_fma_legacy_f32 are 32bit only.

Was this work supposed to be enabled in the 2024.Q2.1 release?
I tried a quick test with my iGPU in Risen 1 and still get a big performance drop when setting d3d9.floatEmulation = Strict

AMDVLK 2024.Q2.1

d3d9.floatEmulation = True
Q2 1-true

d3d9.floatEmulation = Strict
Q2 1-strict

RADV for comparison

d3d9.floatEmulation = True
radv-true

d3d9.floatEmulation = Strict
radv-strict

@Blisto91 Thanks for your feedback. For the cause of the issue, I have two suspects here:

  1. My fix hasn't fit the pattern of IR pattern generate by the game.
  2. Other fix which is related with fastmath flag has broken the pattern on which I have the optimized.

To confirm which one cause this issue, would you please(or ask dxvk devs for assistance) to dump the pipeline then I can check whether my optimization has been effective.

Thanks.

You can dump the shaders by setting the environment variable DXVK_SHADER_DUMP_PATH=/your/path and then running the game with DXVK.

That will export the generated SPIR-V among other things.

Any D3D9 game will work, you just need to also set the environment variable DXVK_CONFIG=d3d9.floatEmulation = Strict; to enable the accurate float behavior.

Linked is a dxvk shader dump from Risen 1 running on my 7950x iGPU with amdvlk 2024.Q2.1 and d3d9.floatEmulation = strict

https://drive.proton.me/urls/SF8RPVZ6CG#Rk7KIIG4d480

@Blisto91 Thanks. But unfortunatelly I can't access this link.... Maybe you can add the related files in this page attached?

@ruiminzhao Hi there. I hope a 7zip wrapped in a zip is fine as most formats Github allows doesn't compress enough on their own.
Risen-amdvlk-strict-float.zip

@ruiminzhao Hi there. I hope a 7zip wrapped in a zip is fine as most formats Github allows doesn't compress enough on their own. Risen-amdvlk-strict-float.zip

Thanks. I can get the log now and will have a look later.

The root cause has been found, the pattern used widely is like:
"

%2272 = select reassoc nnan nsz arcp contract afn i1 %2270, float 0.000000e+00, float %2271
%2273 = insertelement <3 x float> poison, float %2272, i64 0

%2276 = select reassoc nnan nsz arcp contract afn i1 %2274, float 0.000000e+00, float %2275
%2277 = insertelement <3 x float> %2273, float %2276, i64 1

%2280 = select reassoc nnan nsz arcp contract afn i1 %2278, float 0.000000e+00, float %2279
%2281 = insertelement <3 x float> %2277, float %2280, i64 2
..

%2293 = select reassoc nnan nsz arcp contract afn i1 %2291, float 0.000000e+00, float %2292
%2294 = insertelement <3 x float> poison, float %2293, i64 0

%2297 = select reassoc nnan nsz arcp contract afn i1 %2295, float 0.000000e+00, float %2296
%2298 = insertelement <3 x float> %2294, float %2297, i64 1

%2301 = select reassoc nnan nsz arcp contract afn i1 %2299, float 0.000000e+00, float %2300
%2302 = insertelement <3 x float> %2298, float %2301, i64 2

%2303 = fmul reassoc nnan nsz arcp contract afn <3 x float> %2281, %2302

"
It hasn't been caught, it needs to reorder the process like this:
"
Many other transforms
Scalarizer pass
fmul_legacy / fma_legacy matching
"