NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0
scaronni opened this issue · comments
NVIDIA Open GPU Kernel Modules Version
550.78
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Fedora 40
Kernel Release
6.8.7-300.fc40.x86_64
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 4070 SUPER
Describe the bug
Kernel messages being spammed by these lines:
[ 6614.717414] NVRM: Xid (PCI:0000:01:00): 16, pid='<unknown>', name=<unknown>, Head 00000003 Count 0000f82a
[ 6614.717420] NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0
After a few iterations of the two, it keeps spamming NVRM: krcWatchdogCallbackVblankRecovery_IMPL [...]
.
To Reproduce
Just boot the system with the open kernel modules installed.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response
I can confirm that behavior with driver 550.54.15-1 (newest from CUDA repo), Debian unstable, custom-built kernels of at least versions 6.8.9, 6.9.0 and 6.9.1, a GeForce RTX 4090 and 5 displays connected. For me, it happens on head C0
.
The displays are 2 DP screens (both running), 1 Valve Index VR headset on DP (not running), 1 HDMI screen (not connected to power), and 1 HDMI TV by LG (turned off or on).
The messages repeat very close to every 8.192s seconds and stop at some point (after ~40 minutes this time, not sure if consistent).
The error only occurs when the LG TV is a) connected and b) not enabled in X. Not sure if the message does indicate some actual problem, but I would prefer not to have my logs flooded with the message.
I believe this should be fixed with 555.42.02. This is the relevant change so you can apply it to 550.xx as well:
diff --git a/src/nvidia/src/kernel/gpu/disp/head/kernel_head.c b/src/nvidia/src/kernel/gpu/disp/head/kernel_head.c
index 50e14fa..5da4a43 100644
--- a/src/nvidia/src/kernel/gpu/disp/head/kernel_head.c
+++ b/src/nvidia/src/kernel/gpu/disp/head/kernel_head.c
@@ -235,7 +235,8 @@ kheadReadVblankIntrState_IMPL
)
{
// Check to make sure that our SW state grooves with the HW state
- if (kheadReadVblankIntrEnable_HAL(pGpu, pKernelHead))
+ if (kheadReadVblankIntrEnable_HAL(pGpu, pKernelHead) &&
+ kheadGetDisplayInitialized_HAL(pGpu, pKernelHead))
{
// HW is enabled, check if SW state is not enabled
if (pKernelHead->Vblank.IntrState != NV_HEAD_VBLANK_INTR_ENABLED)
I believe this should be fixed with 555.42.02. This is the relevant change so you can apply it to 550.xx as well:
Thanks! I did not try to apply the patch, but the upgrade to 550.42.02, that is now packaged, fixes the issue for me.