gpu-next: flashing artifacts on some videos with interpolation and certain tscale algorithms
mia-0 opened this issue · comments
mpv Information
mpv 0.38.0+git20240701.7c70df09 Copyright © 2000-2024 mpv/MPlayer/mplayer2 projects
libplacebo version: v7.349.0
FFmpeg version: 6.1.1
FFmpeg library versions:
libavcodec 60.31.102
libavdevice 60.3.100
libavfilter 9.12.100
libavformat 60.16.100
libavutil 58.29.100
libswresample 4.12.100
libswscale 7.5.100
Other Information
- Linux version: openSUSE Tumbleweed 20240701
- Kernel Version: Linux 6.9.5-1-default #1 SMP PREEMPT_DYNAMIC Tue Jun 18 07:38:24 UTC 2024 (c9c2e24) x86_64 x86_64 x86_64 GNU/Linux
- GPU Model: AMD Radeon 6700 XT (Navi 22)
- Mesa/GPU Driver Version: Mesa 24.1.2
- Window Manager and Version: sway 1.9
- Source mpv: local rebuild of openSUSE package updated to git master
- Introduced in version: unknown
Reproduction Steps
mpv --no-config --vo=gpu-next --interpolation --tscale=catmull_rom --video-sync=display-resample https://0x0.st/XA8c.mp4
Expected Behavior
Video plays normally without graphical glitches.
Actual Behavior
Bright single-frame flashes during motion.
Log File
Sample Files
BP09J_XfTDaYeCn7yNkqSA.mp4
I carefully read all instruction and confirm that I did the following:
- I tested with the latest mpv version to validate that the issue is not already fixed.
- I provided all required information including system and mpv version.
- I produced the log file with the exact same set of files, parameters, and conditions used in "Reproduction Steps", with the addition of
--log-file=output.txt
. - I produced the log file while the behaviors described in "Actual Behavior" were actively observed.
- I attached the full, untruncated log file.
- I attached the backtrace in the case of a crash.
Looks like only spline16, spline36, spline64, sinc, lanczos, ginseng and catmull_rom are affected.
It seems like it happens whenever the sum of frame weights gets very close to zero. In this case, we have something like:
-> Filter offset -2,951833 = weight 0,002780
-> Filter offset -1,952048 = weight -0,011067
-> Filter offset -0,952263 = weight 0,041413
-> Filter offset 1,047880 = weight -0,035180
-> Filter offset 2,047665 = weight 0,008759
-> Filter offset 3,047450 = weight -0,001454
wsum: 0,005251
(using 4-tap spline as an example, since that seemed to produce the worst artifacts)
It seems like we end up creating a sort of strange exaggerated sharpening filter, resulting in a lot of ringing whenever the filter kernel exactly aligns with the frame offsets like this. I'm not sure why exactly these filters suffer from it.
So, looking more closely at it, it seems that this is some sort of cursed VFR source where individual frames are randomly missing. Whenever this happens, and the 'missing' frame happens to exactly align with the central lobe of the tscale filter kernel, the result explodes, because of the normalization (division) by wsum
. This is a form of temporal aliasing. (Even without the normalization, from a signal theory PoV, the result would tend towards 0 here - leading to flashing black frames instead of oversharpened glitched frames)
I think the correct solution here would be to always sufficiently blur / widen the kernel to prevent any such 'holes' in the data from aliasing the kernel. But I'm not sure how this would look in practice. I can try it out.
I think the correct solution here would be to always sufficiently blur / widen the kernel to prevent any such 'holes' in the data from aliasing the kernel. But I'm not sure how this would look in practice. I can try it out.
Looking at the code, we already do this ,so something about the explanation doesn't quite add up yet..
Okay, figured out how to improve things here. The math as written was only compensating for the case when the vsync duration exceeded the frame duration, however it never accounted for the case of the frame duration itself exceeding 1.0
. We can fix it like so:
diff --git a/src/renderer.c b/src/renderer.c
index 802480d7..e80c072c 100644
--- a/src/renderer.c
+++ b/src/renderer.c
@@ -3353,8 +3353,9 @@ bool pl_render_image_mix(pl_renderer rr, const struct pl_frame_mix *images,
for (int i = 1; i < images->num_frames; i++) {
if (images->timestamps[i] >= 0.0 && images->timestamps[i - 1] < 0) {
float frame_dur = images->timestamps[i] - images->timestamps[i - 1];
- if (images->vsync_duration > frame_dur && !params->skip_anti_aliasing)
- mixer.blur *= images->vsync_duration / frame_dur;
+ float sample_dur = PL_MAX(frame_dur, images->vsync_duration);
+ if (sample_dur > 1.0f && !params->skip_anti_aliasing)
+ mixer.blur *= sample_dur;
break;
}
}
(And actually, maybe we should even lower the blur (i.e. sharpen the kernel) in the event that the frame duration is significantly below 1?)
One thing I don't like about this specific approach is that it ends up switching the filter kernel size instantly from e.g. 1 to 2 whenever the 'missing' frame hits the center. It would be nicer if we could, like, somehow interpolate the filter size itself so that the filter grows and shrinks dynamically to adapt to the interval? Maybe something simple like a center-weighted average. 🤷🏻