ValveSoftware / gamescope

SteamOS session compositing window manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

direct scan-out: toggling mangohud causes artifacts to appear sporadically in gamescope-session during pipeline split (suspected drm/amd)

matte-schwartz opened this issue · comments

NOTE:
I'm merging my two past issues in this one as they provided an inaccurate description of the actual bug I've discovered.
#1343
#1293

there's two key components here behind every artifacting issue I've reported since May 9th:

  1. when/how gamescope decides it should composite actively or not using direct scan-out, determined with the compositor debugger in Steam

  2. what happens when gamescope isn't actively compositing, especially on kernels past 6.8.0 including 6.8.0-valve1-preview and the latest HEAD of amd-staging-drm-next, but also on 6.5.0-valve7 to a minor extent.


breakdown

on certain display modesets, which I'll call "bad" modesets, gamescope decides it should stop compositing entirely within gamescope-session. for me, this seems to be the absolute highest resolution on the list of options within gamescope-session for my external ultrawide monitor, 3440x1440p@60hz. on my ROG Ally this is 1920x1080x120hz.

when this happens, unless "force composite" is selected in the dev settings of steam, many games will have severe artifacting on kernels past 6.8.0 - AND - you will have minor artifacting when you toggle the HUD like i've previously reported on SteamOS Main/Preview on any kernel past 6.1.52 that i've tried: #1343
the reason my libliftoff revert worked in when i was testing this last week is because I reverted it back into a state where gamescope is always compositing. the issue was never fixed after all, just hidden due to an error in my testing method.

Video 1 (BAD MODESET) shows one of these bad modeset session at 3440x1440@60hz and how force-composite fixes the artifacts https://youtu.be/h6i5brCm9Jc


gamescope also seems to get stuck in this bad modeset, as trying to switch to a "good" modeset does not fix the issue for me until I totally reboot my device after changing the display settings within gamescope-session, OR if I delete edid.bin from ~/.config/gamescope and re-enter gamesocpe-session. if i boot into one of my monitor's good modesets, like the second-highest option 3440x1440p@175hz on the resolution list of gamescope-session, then the entire session seems to properly determine when to composite and when not to until I switch to bad modeset.

Video 2 (GOOD MODESET INTO BAD MODESET) shows one of these good modeset sessions at 3440x1440@175hz and how it doesn't need the force composite, then switching into a bad modeset and how it ruins the session https://youtu.be/_9oiP39czKo


conclusion

If I wasn't able to repro this on multiple setups with different hardware across both internal and external screen types, I would have blamed a bad cable or faulty monitor. However, some version of this issue is present on all 4 AMD devices I own in some capacity: a laptop with a 7900xtx, ROG Ally with a z1 extreme apu, Steam Deck OLED, and to a lesser extent, Steam Deck LCD.

My best guess is that something between kernel 6.7 and 6.8 changed the way gamescope modesetting functions, while older kernels have a slightly less bad version of this issue (the HUD switching artifacting is present but not the full artifacting like Video 1). I was not able to repro this on my NVIDIA setups under the same conditions, which is why it's hard for me to know where to actually take this issue up - AMD/DRM or gamescope? since to my eyes it seems like it's about a 50-50 split between where the issue may lie...

Happy to provide more evidence or logs upon request, but that's my conclusion after thorough testing across Fedora, Arch Linux and SteamOS.

Extended report: SteamOS Stable, SteamOS Preview, SteamOS Main, linux-neptune-61, linux-neptune-65, and linux-neptune-68.

SteamOS Stable, linux-neptune-61, no modifications
this issue is not reproducible via any method.


SteamOS Main/Preview and linux-neptune-65, no modifications
This is the least affected version of SteamOS, but it still has one quirk in particular which is related to gamescope's "lazy" compositing. this is the issue where MangoHud's performance overlay causes artifacts when toggling back and forth from Off -> On.

To repro this issue:

  1. From the steam library home page, press A to enter a game's entry
  2. From this game entry, now press B to exit back into your library home page
  3. Toggle the steam performance overlay from Off -> any of the overlay presets
  4. Toggle the steam performance overlay back to Off
  5. repeat steps 3 and 4 multiple times and notice the artifacts that briefly appear on the screen

Force compositing fixes this.

Here is a video of steps 1 -> 5 on my Steam Deck OLED - LCD deck has the same issue.

IMG_1469.mov

SteamOS Main and linux-neptune-68
the only thing I modified here to verify this was my GRUB settings to boot me into 6.8.0-valve1-preview
This is the most affected version of SteamOS, and matches the behavior that I noticed across all of my other AMD devices. That includes the previously described Mangohud overlay regression, plus the primary 6.8.0 regression relating to direct scan-out.

To repro this issue - please use a D3D12 game. they seem to "break" and cause artifacting on the display the most consistently when gamescope isn't compositing. all of my testing was done with Hades II if that helps:

  1. Enter into your D3D12 game of choice
  2. to trigger the wave of artifacting, you can toggle the side menus. you can also do this with a real mouse, or by using the touchpads as emulated mouse input by holding down one of the menu buttons
  3. turn on force compositing in the steam dev settings and watch all artifacts disappear

Here is a video of how to do the first two steps with Hades II on OLED deck - LCD Deck has the same issue

329386655-96bdea03-f075-4938-963d-f1718568354a.mov

Force compositing also fixes this issue.

Can replicate it on both ChimeraOS (testing branch) and Nobara 40.
Issue is present whenever the FPS is above the set refresh rate. Happens with or without VRR.

SteamOS Main and linux-neptune-68 the only thing I modified here to verify this was my GRUB settings to boot me into 6.8.0-valve1-preview This is the most affected version of SteamOS, and matches the behavior that I noticed across all of my other AMD devices. That includes the previously described Mangohud overlay regression, plus the primary 6.8.0 regression relating to direct scan-out.

After bisecting the kernel tree multiple times, my only conclusion is that this is directly related to the kernel versioning per 8309609. Gamescope only started working again for me on all my earlier 6.8-rc candidates when I backported torvalds/linux@2aa6f5b from 6.8-rc6 which fixes explicit sync. I've tried around 100 kernels in the past day and none on any 6.8 version and later do not have artifacting. Setting

gamescope::ConVar<bool> cv_drm_debug_disable_explicit_sync( "drm_debug_disable_explicit_sync", false, "Force disable explicit sync on the DRM backend." );
to true or force compositing both work to get around the issue.

It must be triggered by having a certain number of explicit sync compatible packages installed - whether that be a combination of Mesa, xwayland, gamescope, and whatever else has implemented explicit sync. It has to be a combo already present on SteamOS Main's (plus using the latest 6.8 valve preview kernel) as well which narrows it down significantly. Perhaps a Mesa slice? Maybe I'll dig into it when I haven't just built more kernels than ever before in my life 🐸

also have not yet had a chance to look into the SteamOS Preview issue I reported with MangoHud's artifacting so that's on my list to get back to soon

SteamOS Main/Preview and linux-neptune-65, no modifications This is the least affected version of SteamOS, but it still has one quirk in particular which is related to gamescope's "lazy" compositing. this is the issue where MangoHud's performance overlay causes artifacts when toggling back and forth from Off -> On.

I've stared at some DRM debugging logs for 6.1.52-valve19, 6.5.0-valve11, and my normal 6.9.5-fsync kernels and I think I've figured out where the artifacting comes from. NOTE - this seems to be entirely separate from the explicit sync artifacting.

When I toggle MangoHud on 6.1.52-valve19, my drm kernels logs show this for disabling the overlay

  625.145145] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 90 on DRM crtc 100
[  625.145425] amdgpu 0000:04:00.0: [drm:drm_atomic_get_private_obj_state] Added new private object 000000004db1c4e7 state 000000002df8647f to 00000000ff8a9ab1
[  625.145429] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 84 on DRM crtc 100
[  625.145670] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 72 on DRM crtc 100
[  625.145932] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 90 on DRM crtc 100
[  625.146177] [drm:dm_crtc_helper_atomic_check [amdgpu]] Can't enable a CRTC without enabling the primary plane
[  625.146464] amdgpu 0000:04:00.0: [drm:drm_atomic_helper_check_planes] [CRTC:100:crtc-0] atomic driver check failed
[  625.146469] [drm:amdgpu_dm_atomic_check [amdgpu]] drm_atomic_helper_check_planes() failed
[  625.146743] [drm:amdgpu_dm_atomic_check [amdgpu]] Atomic check failed with err: -22 

and this for enabling the overlay

[  700.101125] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 90 on DRM crtc 100
[  700.101401] amdgpu 0000:04:00.0: [drm:drm_atomic_get_private_obj_state] Added new private object 000000004db1c4e7 state 000000003dd68ca8 to 0000000030834f76
[  700.101405] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 72 on DRM crtc 100
[  700.101668] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 90 on DRM crtc 100
[  700.102029] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 84 on DRM crtc 100
[  700.102395] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 72 on DRM crtc 100
[  700.102654] [drm:amdgpu_dm_atomic_check [amdgpu]] MPO enablement requested on crtc:[00000000cc319b52]
[  700.102952] amdgpu 0000:04:00.0: [drm:drm_atomic_nonblocking_commit] committing 0000000030834f76 nonblocking
[  700.102959] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.103202] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.103442] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.103688] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.103933] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.104172] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.104411] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.104652] [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  700.105267] [drm:dcn301_smu_set_min_deep_sleep_dcfclk [amdgpu]] dcn301_smu_set_min_deep_sleep_dcfclk(17832)
[  700.105754] [drm:dcn301_smu_set_dppclk [amdgpu]] dcn301_smu_set_dppclk(102857)
[  700.106507] [drm:dcn20_program_front_end_for_ctx [amdgpu]] Reset mpcc for pipe 1
[  700.114021] [drm:mpc2_assert_idle_mpcc [amdgpu]] REG_WAIT taking a while: 3ms in mpc2_assert_idle_mpcc line:478
[  700.114419] [drm:dcn10_plane_atomic_power_down [amdgpu]] Power gated front end 1
[  700.114793] [drm:dcn20_post_unlock_program_front_end [amdgpu]] Power down front end 1

There is no artifacting present at any point during testing, whether that be on the Home screen, Library, or in-game.


Now, moving on to newer kernels, including 6.5.0-valve11, is where things start to get interesting. Here are my logs disabling the overlay on 6.5.0-valve11

[   73.504864] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 90 on DRM crtc 99
[   73.505377] amdgpu 0000:04:00.0: [drm:drm_atomic_get_private_obj_state] Added new private object 0000000076e690bd state 000000008fd721c6 to 00000000f2b6b127
[   73.505386] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 84 on DRM crtc 99
[   73.505876] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 72 on DRM crtc 99
[   73.506346] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 90 on DRM crtc 99
[   73.509307] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 72 on DRM crtc 99
[   73.512227] [drm:amdgpu_dm_atomic_check [amdgpu]] MPO enablement requested on crtc:[0000000052e6bb0b]
[   73.512788] amdgpu 0000:04:00.0: [drm:drm_atomic_nonblocking_commit] committing 00000000f2b6b127 nonblocking
[   73.512801] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.513238] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.513697] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.514164] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.514622] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.515048] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.515500] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.515975] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.516416] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[   73.517255] amdgpu 0000:04:00.0: [drm:commit_minimal_transition_state [amdgpu]] commit_minimal_transition_state base = new state, reason = MPC in Use
[   73.517846] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]     pipe topology update
[   73.518150] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]   ________________________
[   73.518431] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane0  slice0  stream0|
[   73.518715] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP0----OPP0----OTG0----|
[   73.518991] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane1 |               |
[   73.519342] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP3----|               |
[   73.519637] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]]  |________________________|
[   73.519984] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]] Reset mpcc for pipe 2
[   73.527728] amdgpu 0000:04:00.0: [drm:dcn10_plane_atomic_power_down [amdgpu]] Power gated front end 2
[   73.528109] amdgpu 0000:04:00.0: [drm:dcn20_post_unlock_program_front_end [amdgpu]] Power down front end 2
[   73.528527] amdgpu 0000:04:00.0: [drm:dcn301_smu_set_min_deep_sleep_dcfclk [amdgpu]] dcn301_smu_set_min_deep_sleep_dcfclk(11888)
[   73.529265] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]     pipe topology update
[   73.529580] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]   ________________________
[   73.529860] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane0  slice0  stream0|
[   73.530139] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP0----OPP0----OTG0----|
[   73.530415] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane0 |               |
[   73.530692] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP1----|               |
[   73.530966] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane1 |               |
[   73.531241] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP3----|               |
[   73.531514] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane1 |               |
[   73.531786] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP2----|               |
[   73.532061] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]]  |________________________|
[   73.534989] amdgpu 0000:04:00.0: [drm:dcn20_program_pipe [amdgpu]] Un-gated front end for pipe 1
[   73.540173] amdgpu 0000:04:00.0: [drm:dcn20_program_pipe [amdgpu]] Un-gated front end for pipe 2

and while here are my logs for enabling the overlay:

[  120.880858] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 90 on DRM crtc 99
[  120.881240] amdgpu 0000:04:00.0: [drm:drm_atomic_get_private_obj_state] Added new private object 0000000076e690bd state 00000000d571bd77 to 0000000092f6d636
[  120.881247] [drm:dm_update_plane_state [amdgpu]] Disabling DRM plane: 72 on DRM crtc 99
[  120.881599] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 90 on DRM crtc 99
[  120.884116] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 84 on DRM crtc 99
[  120.886600] [drm:dm_update_plane_state [amdgpu]] Enabling DRM plane: 72 on DRM crtc 99
[  120.889091] [drm:amdgpu_dm_atomic_check [amdgpu]] MPO enablement requested on crtc:[0000000052e6bb0b]
[  120.889511] amdgpu 0000:04:00.0: [drm:drm_atomic_nonblocking_commit] committing 0000000092f6d636 nonblocking
[  120.889522] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.889954] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.890377] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.890774] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.891177] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.891567] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.891922] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.892393] [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] No FB bound
[  120.892859] amdgpu 0000:04:00.0: [drm:commit_minimal_transition_state [amdgpu]] commit_minimal_transition_state base = current state, reason = MPC in Use
[  120.893490] amdgpu 0000:04:00.0: [drm:dcn301_smu_set_dppclk [amdgpu]] dcn301_smu_set_dppclk(102857)
[  120.894041] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]     pipe topology update
[  120.894371] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]   ________________________
[  120.894662] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane0  slice0  stream0|
[  120.894944] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP0----OPP0----OTG0----|
[  120.895236] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane1 |               |
[  120.895517] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP3----|               |
[  120.895796] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]]  |________________________|
[  120.896132] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]] Reset mpcc for pipe 1
[  120.896539] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]] Reset mpcc for pipe 2
[  120.906470] amdgpu 0000:04:00.0: [drm:mpc2_assert_idle_mpcc [amdgpu]] REG_WAIT taking a while: 4ms in mpc2_assert_idle_mpcc line:478
[  120.906850] amdgpu 0000:04:00.0: [drm:dcn10_plane_atomic_power_down [amdgpu]] Power gated front end 1
[  120.907179] amdgpu 0000:04:00.0: [drm:dcn20_post_unlock_program_front_end [amdgpu]] Power down front end 1
[  120.907522] amdgpu 0000:04:00.0: [drm:dcn10_plane_atomic_power_down [amdgpu]] Power gated front end 2
[  120.907842] amdgpu 0000:04:00.0: [drm:dcn20_post_unlock_program_front_end [amdgpu]] Power down front end 2
[  120.908610] amdgpu 0000:04:00.0: [drm:dcn301_smu_set_min_deep_sleep_dcfclk [amdgpu]] dcn301_smu_set_min_deep_sleep_dcfclk(17832)
[  120.909603] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]     pipe topology update
[  120.909950] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]   ________________________
[  120.910263] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane0  slice0  stream0|
[  120.910547] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP0----OPP0----OTG0----|
[  120.910841] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane1 |               |
[  120.911139] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP3----|               |
[  120.911440] amdgpu 0000:04:00.0: [drm:resource_log_pipe [amdgpu]]  | plane2 |               |
[  120.911744] amdgpu 0000:04:00.0: [drm:resource_log_pipe_topology_update [amdgpu]]  |DPP2----|               |
[  120.912044] amdgpu 0000:04:00.0: [drm:dcn20_program_front_end_for_ctx [amdgpu]]  |________________________|
[  120.917410] amdgpu 0000:04:00.0: [drm:dcn20_program_pipe [amdgpu]] Un-gated front end for pipe 2

Now as to why the artifacting is not constantly present when using this method of adding/removing the overplay plane, that has also been answered by kernel drm debugging logs.

If I watch my dmesg via ssh with sudo dmesg --follow, I can actually see that toggling the overlay only causes artifacts if the display is in a constant cycle of drm_atomic and amdgpu_dm_plane_helper_prepare_fb state changes like on the Home page of the Steam Deck or while in game.

If the cycle of changing planes is idle while you toggle the overlay, like [drm:amdgpu_dm_crtc_vblank_control_worker [amdgpu]] Allow idle optimizations (MALL): 0 then no artifacting occurs.

What do we think, gamescope issue or should I be reporting this to amd after all? I'll admit I'm a bit vexed at this point.

@matte-schwartz
Yeah you may want to also post an issue here:
https://gitlab.freedesktop.org/drm/amd

I've spoken with Mario about the explicit sync issue offline while diagnosing that type of artifacting, let me file a new issue for this since it seems to be separate from the explicit sync stuff.

the accompanying bug report for this specific overlay plane artifacting issue is here: https://gitlab.freedesktop.org/drm/amd/-/issues/3441

For SteamOS 3.6 and SteamOS Main's kernels:

The real issue is the pipeline split that takes place when toggling the overlay on Linux 6.5 and later: logs here for reference I was able to narrow it down to the pipeline split by using

enum DC_DEBUG_MASK {
	DC_DISABLE_PIPE_SPLIT = 0x1,

in the grub cmdline as amdgpu.dcdebugmask=0x1.

Here is a clip on 6.5.0-valve13 without the debugmask set, where you can briefly see artifacts appear when the overlay is toggled as I originally reported:

IMG_1822.mov

And here is a clip of the same kernel but adding amdgpu.dcdebugmask=0x1 into the commandline:

IMG_1823.mov

It's a tough bisect on the kernel side due to the nature of patching a kernel so it does not try and composite when using an overlay, since the issue is not present if gamescope composites with the overlay enabled.

(sidenote - that speck in the middle of the clips is dust not a panel issue 🐸)

edit: in lieu of the debugmask, this kernel patch also seems to work when applied to 6.5.0-valve13:

From 987d8fe085675aaf9f5d689d25e1688a705dcd7e Mon Sep 17 00:00:00 2001
From: Matthew Schwartz <mattschwartz@gwu.edu>
Date: Thu, 18 Jul 2024 00:15:01 +0000
Subject: [PATCH] [NOT-FOR-UPSTREAM] drm/amd/display: disable pipe split for
 DCN3.1

Pipe splitting seems to have regressed between 6.1 and 6.5 when using
Neptune kernel patches, causing artifacts to appear when the pipeline
gets split while calling/removing the Steam performance overlay.

For now, let's move to using MCU_SPLIT_AVOID instead of MCU_SPLIT_DYNAMIC.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3441
Signed-off-by: Matthew Schwartz <mattschwartz@gwu.edu>
---
 .../gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
index 7aa71efd091c..9a83842f1c77 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
@@ -689,7 +689,7 @@ static const struct dc_debug_options debug_defaults_drv = {
 	.disable_clock_gate = true,
 	.disable_pplib_clock_request = true,
 	.disable_pplib_wm_range = true,
-	.pipe_split_policy = MPC_SPLIT_DYNAMIC,
+	.pipe_split_policy = MPC_SPLIT_AVOID,
 	.force_single_disp_pipe_split = false,
 	.disable_dcc = DCC_ENABLE,
 	.vsr_support = true,
-- 
2.45.2

edit 2:
well that patch is no good since it just puts us back to where we were with this bug report: https://gitlab.freedesktop.org/drm/amd/-/issues/2247

Only solution is to find out what in the pipeline topology logic causes this issue which is definitely beyond my skill level to figure out.

no changes here with v3.14.26 sadly, explicit sync must still be disabled for the DRM backend on AMD cards

6.8 artifacting fixed with: f35e1b3 @hivehivemind

the mangohud issue is being worked on in drm/amd, retitling this issue for clarity now that there's only one kind of scan-out issue report here. really should have been two separate reports after all, huh...