ValveSoftware / gamescope

SteamOS session compositing window manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

commit 299bc34 causes the Steam UI to lag while showing MangoHud on VRR displays when using direct scan-out

matte-schwartz opened this issue · comments

matt@nobara-pc:~/src/gamescope$ git bisect bad
299bc3410dcfd46da5e3c988354b60ed3a356900 is the first bad commit
commit 299bc3410dcfd46da5e3c988354b60ed3a356900
Author: Joshua Ashton <joshua@froggi.es>
Date:   Fri May 17 10:01:49 2024 +0100

    steamcompmgr: Move outdatedInteractiveFocus to window

 src/steamcompmgr.cpp        | 39 +++++++++++++++------------------------
 src/steamcompmgr_shared.hpp |  1 +
 2 files changed, 16 insertions(+), 24 deletions(-)
matt@nobara-pc:~/src/gamescope$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [e998f26a6fe4439461dfeaa6dd57c5be0bb46953] InputEmulation: refcounting/lifetime fixes
git bisect good e998f26a6fe4439461dfeaa6dd57c5be0bb46953
# status: waiting for bad commit, 1 good commit known
# bad: [05fe96c05b54c61ac938204d6b39742318d9ae31] Revert "Update libliftoff"
git bisect bad 05fe96c05b54c61ac938204d6b39742318d9ae31
# good: [d404e0f069d343d8084415832a006585fdef9c99] wlserver: Fix content overrides for reparented windows
git bisect good d404e0f069d343d8084415832a006585fdef9c99
# bad: [aabe499be10a32bf1431e3790dcd216f4258844f] wlserver, steamcompmgr: Don't force repaint on cursor moves if no image
git bisect bad aabe499be10a32bf1431e3790dcd216f4258844f
# good: [312e25b14640f3fa88469b57e898a4b2c069a186] InputEmulation: refcounting/lifetime fixes
git bisect good 312e25b14640f3fa88469b57e898a4b2c069a186
# good: [9350c88007b351c0406702f872e6db3ca9160fc4] steamcompmgr: Add GetGlobalPossibleFocusWindows
git bisect good 9350c88007b351c0406702f872e6db3ca9160fc4
# bad: [751e728d2f2657446ce6a9cdabc3d1f0ca36c01d] steamcompmgr: Add customizable pipewire appid focus
git bisect bad 751e728d2f2657446ce6a9cdabc3d1f0ca36c01d
# bad: [299bc3410dcfd46da5e3c988354b60ed3a356900] steamcompmgr: Move outdatedInteractiveFocus to window
git bisect bad 299bc3410dcfd46da5e3c988354b60ed3a356900
# first bad commit: [299bc3410dcfd46da5e3c988354b60ed3a356900] steamcompmgr: Move outdatedInteractiveFocus to window

299bc34 causes the Steam UI to lag when gamescope is not actively compositing, as determined by the compositor debugging squares (enabled the entire time) with MangoHud overlay. unfortunately, this issue is unrelated to the other gamescope compositing issues I reported because of course life can't be that simple 🐸

Before reverting 299bc34 - stutter and ~40fps (from 6f4bc2e):

IMG_1611-1.mov

After reverting 299bc34 - no stutter and 60fps (from 6f4bc2e):

IMG_1612-1.mov

well, using mangohud to monitor UI performance ended up throwing a slight wrench in the report... the UI seems to lag the most for me only while presenting mangohud. if mangohud is not on screen the UI performance is much closer to expectations. force compositing still fixes the performance when showing mangohud though

was able to narrow down the effects of this revert further to VRR being enabled on panels that support it. if VRR is on, the revert brings the steam ui performance up from a shakey ~48fps to a stable ~60fps at 3440x1440@60hz on a 7900xtx within gamescope-session

with the new adaptive_sync convar in gamescope, this issue is now much easier to repro on OLED Deck. Set Mangohud overlay to full while the screen hasn't entered a \[drm:amdgpu_dm_crtc_vblank_control_worker \[amdgpu\]\] Allow idle optimizations (MALL): 0 state and you will see the UI fps tank with gamescopectl adaptive_sync 1 and then shoot back up to 90fps with gamescopectl adaptive_sync 0

@matte-schwartz
Is this all tested on embedded gamescope (aka "gaming mode" on steam deck)?
Especially w/ embedded mode, you could try profiling w/ gpuvis: https://github.com/ValveSoftware/gamescope/wiki/Tracing

I would advise you to try the above gpuvis profiler first, but there’s also another alternative profiling integration I’ve been working on: Tracy profiler: #1328
One neat thing about Tracy profiler is that the profiling stats are shown live while gamescope is running.
Caveat with my Tracy profiler integration:

  • Tracy profiler doesn’t collect the DRM-related stats that gpuvis captures. Though it does do gpu/vulkan instrumentation which is useful when gamescope has to do compositing w/ async vulkan compute.
  • The commit tracking is less detailed compared to gpuvis
  • the frame-time tracking for gamescope-xwm covers some post-render/post-present work, so can be a bit longer than the actual frame time
  • Oh also you’ll need to run gamescope as root in order to capture hardware vblank markers

Note that when building gamescope w/ my Tracy integration, you have to add -DTRACY_ENABLE -Dtracy_enable=true to meson setup/configure cmd
Also you have to install the Tracy server to view the profiling data captured from gamescope
How to build Tracy server:
First make sure you have capstone installed

git clone https://github.com/wolfpld/tracy.git
cd tracy/profiler
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
MAKEFLAGS="-j$(nproc)" make

Now that I think about it, I should probably add some code to use the Tracy custom data plotting functionality w/ the frametime info that’s passed to mangoapp…

this is all tested from SteamOS Main in the shipped gamescope-session, with gamescope compiled from latest git using the SteamOS PKGBUILD paired with the steamos devkit client. I will give Tracy profiling a test in a bit, but the devkit makes gpuvis stupidly easy to use.

from boot (with adaptive sync disabled):
(deck@steamdeck ~)$ gamescopectl adaptive_sync 1
(deck@steamdeck ~)$ gamescopectl adaptive_sync 0
(deck@steamdeck ~)$ gamescopectl adaptive_sync 1

the gpu-trace.zip: https://drive.filen.io/d/69d86e9d-f12e-4b84-9d19-3d7be92f8f8f#flpeGVfwN6bFhRsi9B70uOWI6pFIpLaY
(way too big for GitHub lol)

Hopefully this offers a bit more insight into the issue. Let me know if you need me to recapture anything.

Thanks for the new gamescopectl commits @sharkautarch running tracy with sudo in drm mode has its own set of issues (running steam as root is a big no) but even without sudo hardware vblank logs the tracy profiler has showed some interesting stuff.

Here we have gamescopectl adaptive_sync 0, so adaptive sync disabled.
adaptive_sync_0

Here we have gamescopectl adaptive_sync 1
adaptive_sync_1

It seems like these repeated longer paint_all() -> Present() calls may be the reason I'm seeing decreased performance with adaptive sync on? The longer initial paint_all() -> Present( &frameInfo,async ) seems normal when changing between adaptive sync options

I also exported my tracy session since I'll be honest, most of the code in gamescope is still a big ??? for me 🐸
adaptive-sync-testing.tracy.tar.gz

Edit:
Well it turns out I get much clearer results if I also force the Steam UI to continuously render video frames with the debugging parameter for it:
adaptive-sync-constant-render-with-mangohud.tracy.tar.gz

image

It seems like adaptive sync also gives me this lockable which does not seem to show up when gamescope is not using adaptive sync. This could be as designed so maybe it's a nothingburger but that's the other major difference between the two states (aside from the very erratic vblanking of course)

image

Just as a side note - it seems tracy profiling triggers some instability in gamescope-session if monitoring for too long.

Screenshot_20240724_110622
paint_all-stats
gfx-0

edit: on the left is adaptive_sync 0 and the right is adaptive_sync 1, with continuously rendered frames in the Steam UI on the Home page of Game Mode

took some time to dive deeper into gpuvis today after packaging it for Fedora, and the results are pretty similar to what I was seeing with Tracy (although I've still only been able to run that as non-root so gamescope/steam don't complain) but that's a good indication of its accuracy @sharkautarch

@matte-schwartz
a day or so ago, I had made a new commit to my tracy PR to attempt to fix the instability you saw
Are you still having any issues w/ crashing w/ the tracy PR?

It seems like adaptive sync also gives me this lockable which does not seem to show up when gamescope is not using adaptive sync.

Unless you changed the setting yourself, Tracy profiler only displays lock-events wherein at least one thread is blocked waiting to take a lock (meaning a different thread has already acquired the lock).

So it could be an issue in a different scenario, but from looking at your screenshot, there's no lock contention during the long 24ms gamescope-xwm frame so it is extremely unlikely that it is directly causing lag.
I'm pretty sure from the extremely low frequency of lock contention, and the duration looking somehow short, that it is unlikely that this could be indirectly causing lag either.

However, it's possible that the increased lock contention is a symptom of the underlying cause of increased lag when using mangohud + adaptive sync is on.

@matte-schwartz a day or so ago, I had made a new commit to my tracy PR to attempt to fix the instability you saw Are you still having any issues w/ crashing w/ the tracy PR?

No more crashing, was using it merged on top of gamescope master for a bit yesterday

@sharkautarch
On your edit, thanks for the additional context and breakdown. The intricacies of what you're talking about are a bit beyond me, so I think I've taken this as far as I can

I'll wait and see what @Joshua-Ashton makes of this when they get the chance to check out the issue

@matte-schwartz
I took a look at the tracy file you sent me
From looking more closely, it seems like vblankmanager's wait times are sometimes longer than they should be. Where it will briefly spike up to ~19-20 ms.
gamescope_issue_1369_tracy_screenshot
gamescope_issue_1369_tracy_screenshot_2

W/ the two screenshots above, the vertical bars at the top show the 'frame-timing' for vblankmanager.
(For those that are unaware, vblankmanager gives gamescope a schedule for when and how long it should wait in-between frames. So w/ the tracy integration, each profiled 'frame' for vblankmanager measures the time duration starting w/ the first time vblankmanager was sent an update (not including updates from the previous vblank) and then ending when gamescope wakes back up from idling )

So the yellow bars are vblank times of ~19-20ms, and the green bars are around ~11ms

The weird thing I'm seeing is that gamescope is sometimes trying to present in the middle of a vblank 'frame'. as can be seen in the second screenshot.
It's possible that this is due to a 'stale' vblank being sent out, which should be picked up by gpuvis (I should probably update my tracy profiling code to pick that up as well...)

That being said, I also noticed that at one point, a vblank 'frame' somehow fully overlapped one present, and then also partially overlapped a tiny bit of the present before that:
gamescope_issue_1369_tracy_screenshot_3

When I last tested my tracy profiler integration w/ gamescope, (which was only in nested mode on my x11 desktop), I definitely saw some more innocuous looking overlap between vblank and present, but there was only ever a tiny bit of overlap...

Also, see the frame-timing overview for gamescope-xwm, where w/ the blue vertical bars, it's only ever taking at most 2.68ms (which are the slightly more elevated blue bars) for each recorded gamescope-xwm frame.
Interestingly enough, it seems like the slightly-longer 2.68ms frame-times are always the frames before the long 19-20ms vblankmanager frames:
gamescope_issue_1369_tracy_screenshot_4

And now that I look back at screenshot 2 and the vblankmanager frames, I've realized that all or most of the 19-20ms vblankmanager frames overlap a ~10ms present, and that ~10ms present somehow isn't inside of gamescope-xwm frame.
The only reason for why a present wouldn't be inside of a gamescope-xwm frame is if steamcompmgr didn't receive a vblank. This is because I had arbitrarily decided to code my tracy integration that way.
This could mean one of two things:

  • I made a mistake with my tracy profiler integration code
  • Somehow, whatever is causing this bug, is causing gamescope to do too many non-vblank-triggered repaints. Which is throwing off gamescope's frame pacing.

Looking at this fifth screenshot, where we have a 2.68ms normal gamescope-xwm frame, followed by one vblankmanager frame w/ a ~10ms repaint, followed by a ~5microsecond gamescope-xwm frame that doesn't do a present for some reason, followed by a ~11ms vblankmanager frame containing a 2.17ms repaint:
gamescope_issue_1369_tracy_screenshot_5
I'm now leaning towards the latter of the two things...

Something else I've noticed is that there is a significant amount of "hitching" between direct scan-out and gamescope compositing when VRR is enabled that is not present with VRR disabled. There's a solid 1-2 seconds of laggy response time I'd say. It's pretty noticeable if you're using the drm backend on a high refresh-rate monitor and invoke or move the cursor with VRR disabled, and then do the same with VRR enabled.