HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DCC on AMD HW is blocked due to aliasing images with incompatible format

Venemo opened this issue · comments

I found this peculiar case when investigating Ghostwire: Tokyo game performance on RADV. We discussed this with Hans-Kristian, and here I'm providing a summary of that discussion.

What is happening?

The D3D12 app can use DCC, but the same app running on RADV through VKD3D-Proton can't use DCC with DXGI_FORMAT_R16G16B16A16_FLOAT.

As you can see in the code here the image is also accessed through the DXGI_FORMAT_R16G16B16A16_UINT format which is not compatible as far as DCC is concerned.

Why?

This is because VKD3D-Proton needs to ensure that the image works with the ClearUnorderedAccessViewUint function. VKD3D-Proton's implementation of this function uses a shader that writes the image through a uint format. This is done instead of reinterpreting the uint to float, in order to preserve NaN values.

What could be done to solve this?

The implementation of ClearUnorderedAccessViewUint could be changed to distinguish two cases:

  1. When the uint reinterpreted as float is not NaN, use a shader that accesses the image in its proper format
  2. When the uint is NaN, fill a buffer with the NaN value and use a buffer to image copy

We expect that case 1 is the most often used, and it's okay to have case 2 slower.

Note, a few things are also unclear:

  • Does D3D12 expect the implementation to preserve these NaN values or not? If not, then case 2 above is not needed.
  • Does a float image store with NaN preserve the NaN or not? If it does, then once again case 2 may not be necessary.

ClearUnorderedAccessViewUINT is required to preserve the exact bit pattern. I don't know what a shader-based implementation using float stores would look like which would guarantee this without ever producing rounding errors during the double conversion that would be required to make it work. The only thing I can see working reliably is if we create a buffer with a bit-compatible format, use a texel buffer view to fill it, and perform a copy, which is inefficient since we now have to write memory twice and also insert extra barriers. Not great, but the function is used rarely enough that it's probably preferable over losing DCC.

I wasn't aware that UINT disables compression on FLOAT images even in bit-compatible formats, this is extremely annoying, and DXVK is affected in the same way. Moreover, all games that use TYPELESS formats are affected as well since those explicitly allow the application to create UINT views, and this is very common, even though the majority of games will never create such views.

Is there a list of DCC-incompatible formats somewhere? Especially on the DXVK side we may need to completely rethink our approach to implementing both typeless formats and UAV clears.

I don't know what a shader-based implementation using float stores would look like which would guarantee this without ever producing rounding errors during the double conversion that would be required to make it work

For e.g. FP16 -> FP32 -> FP16 roundtrip we can guarantee perfect results (FP extensions just insert more mantissa bits, no rounding required, and a truncation must produce exact result if the input can be represented exactly), so I think we should be able to do image writes with float correctly in 99.99% of cases. The edge case is NaN and perhaps -0.0 (hypothetically denorms, but any denorm FP16 value is normal in FP32). We can always fallback to the existing buffer copy path for annoying cases where we cannot guarantee the roundtrip.

Is there a list of DCC-incompatible formats somewhere? Especially on the DXVK side we may need to completely rethink our approach to implementing both typeless formats and UAV clears.

@doitsujin if I read the radv code correctly, only SINT/UINT/SNORM/UNORM are compatible (there's some special code for sign reinterpretation, but afaict it doesn't disable dcc completely), SFLOAT is incompatible with all of the others. Additionally, component count and size needs to match and storage images with formats that support atomics also cannot use dcc.