NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[555.42.02] D3cold on Turing Mobile not working with kernel 6.9.2. Works with closed driver.

dagbdagb opened this issue · comments

NVIDIA Open GPU Kernel Modules Version

550.78

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Gentoo Linux x86_64 6.7.9-gentoo

Kernel Release

6.7.9-gentoo, own config

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 2070 with Max-Q Design

Describe the bug

I noticed my laptop was slightly warmer than expected. This on 6.8.9-gentoo.
A number of reboots later, I can state that :

dagb@gillette:~ (20:32) $ cat /proc/driver/nvidia/gpus/0000\:01\:00.0/power 
Runtime D3 status:          Not supported
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Not Supported
 Video Memory Off:          Supported

... is the result, if the nvidia-drivers package is built with the kernel-open flag in gentoo, running gentoo-sources-6.7.9.

If built with -kernel-open (leading '-' implies 'no') I have fine-grained control again.

HOWEVER, please also note: I also tried both variants (open/closed kernel driver) on 6.8.9, and there I get 'Not supported' in both cases'.

I have not bisected the issue to a particular kernel version. I just happened to have 6.7.9 on disk.

To Reproduce

  • run gentoo
  • install gentoo-sources-6.7.9
  • build/install kernel
  • build install nvidia-driver:
[ebuild   R    ] x11-drivers/nvidia-drivers-550.78:0/550::gentoo  USE="X kernel-open modules static-libs strip tools wayland -dist-kernel -modules-compress -modules-sign -persistenced -powerd" ABI_X86="(64) -32" 0 KiB
- driver options:

blacklist nouveau
options nvidia-drm modeset=0
options nvidia NVreg_DeviceFileGID=27
options nvidia NVreg_DeviceFileMode=432
options nvidia NVreg_DeviceFileUID=0
options nvidia NVreg_ModifyDeviceFiles=1
options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_PreserveVideoMemoryAllocations=0
options nvidia NVreg_DynamicPowerManagement=0x02
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_UsePageAttributeTable=1
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia


- udev rules:

dagb@gillette:~ (20:45) $ cat /etc/udev/rules.d/80-nvidia-pm.rules

Remove NVIDIA USB xHCI Host Controller devices, if present

ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{remove}="1"

Remove NVIDIA USB Type-C UCSI devices, if present

ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{remove}="1"

Remove NVIDIA Audio devices, if present

ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{remove}="1"

Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind

ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"

Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind

ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"




### Bug Incidence

Always

### nvidia-bug-report.log.gz

[nvidia-bug-report.log.gz](https://github.com/NVIDIA/open-gpu-kernel-modules/files/15287343/nvidia-bug-report.log.gz)


### More Info

I *think* 6.6.30 works with both open and closed kernel driver. Will give it another spin to verify, and update this ticket with the result.

Right. So 6.6.30 also fails with the open driver. If this always was the case, then the bug appears to be with the closed driver. And whatever we are looking for happened between kernel 6.7.9 and 6.8.9. sigh. And the entire ticket belongs somewhere else, I presume?

And the entire ticket belongs somewhere else, I presume?

Yes, here: https://forums.developer.nvidia.com/c/gpu-graphics/linux/148

Seeing how this still is open, I might as well continue here.

In the light of this driver being considered as the default in the linux nvidia-drivers, I would like to point out that in order to get RTD3/D3cold working with my Turing 2070 mobile, I must:

  • use the proprietary kernel driver
  • disable loading the GPU firmware

Any other combination ends up with "Runtime D3 status: Not supported".

This applies to kernel version 6.9.2-gentoo and nvidia-drivers 555.42.02.

I will happily provide an updated nvidia-bug-report.log.gz if required. If so, let me know if you want it with a particular combo of driver and driver options enabled.