johnfanv2 / LenovoLegionLinux

Driver and tools for controlling Lenovo Legion laptops in Linux including fan control and power mode.

Home Page:https://github.com/johnfanv2/LenovoLegionLinux

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]When the load and temperature of the NVIDIA GPU increase, the fan speed lock at 2300 rpm

rango886 opened this issue · comments

Problem Description

When the load and temperature of the NVIDIA GPU increase, the fan speed does not behave as expected.

Environment

Distribution: Ubuntu 23.10

Model name: Legion 7 16ACHg6 2021 / R9000k

CPU model: AMD Ryzen 9 5900HX

GPU model: NVIDIA RTX 3080

CPU stress test: sudo apt install sysbench

GPU stress test: https://www.geeks3d.com/gputest/

Problem Reproduction

The fan configuration is as follows:

image

According to the readings from Psensor, at the current temperature, the fan1 and fan2 curves are normal.

CPU Stress Test

sudo apt install sysbench
sysbench --threads=16 cpu run

image

When the CPU temperature increases, the fan1 and fan2 curves remain normal, with fan1 at 1200 rpm and fan2 at 1100 rpm.

AMD Integrated GPU Stress Test

Download from here: https://www.geeks3d.com/gputest/
./start_furmark_windowed_1024x640.sh

image

When the temperature of the AMD GPU increases, the fan1 and fan2 curves remain normal, with fan1 at 1200 rpm and fan2 at 1100 rpm.

NVIDIA GPU Stress Test

# Switch to the NVIDIA GPU and restart
sudo prime-select nvidia
./start_furmark_windowed_1024x640.sh

image

When the temperature of the NVIDIA GPU increases, neither the fan1 nor fan2 curves behave normally, with both running at 2300 rpm.

Speculation

I have tried:

  1. Enabling and disabling the mini curve
  2. Switching between different BIOS versions (GKCN64WW, GKCN60WW, GKCN59WW, GKCN58WW, GKCN57WW)
  3. Dedicated GPU mode and hybrid mode

None of these have resolved the issue. My speculation is that there might be three fan curves in the EC, corresponding to fan1, fan2, and potentially a fan3 for the NVIDIA GPU. If only the fan1 curve (CPU) and fan2 curve (AMD GPU) are set, without a corresponding fan3 curve (NVIDIA GPU), the fan speed may not behave as expected when the load and temperature of the NVIDIA GPU increase.

I am unsure how to verify the existence of an NVIDIA GPU fan curve.

Debug log

EC Chip ID: 8227
EC Chip Version: 2a4
legion_laptop features: fancurve powermode platformprofile platformprofilenotify minifancurve
legion_laptop ec_readonly: 0
ACPI CFG error: 0
ACPI CFG: 2081289482
temperature access method: 1
CPU temperature error: 0
CPU temperature: 60
CPU temperature EC error: 0
CPU temperature EC: 60
CPU temperature ACPI error: 0
CPU temperature ACPI: 60
CPU temperature WMI error: 0
CPU temperature WMI: 0
CPU temperature WMI2 error: 0
CPU temperature WMI2: 60
CPU temperature WMI3 error: 0
CPU temperature WMI3: 0
GPU temperature error: 0
GPU temperature: 55
GPU temperature EC error: 0
GPU temperature EC: 55
GPU temperature ACPI error: 0
GPU temperature ACPI: 55
GPU temperature WMI error: 0
GPU temperature WMI: 0
GPU temperature WMI2 error: 0
GPU temperature WMI2: 55
GPU temperature WMI3 error: 0
GPU temperature WMI3: 0
fan speed access method: 1
1 fanspeed error: 0
1 fanspeed: 551
1 fanspeed EC error: 0
1 fanspeed EC: 551
1 fanspeed ACPI error: 0
1 fanspeed ACPI: 500
1 fanspeed WMI error: 0
1 fanspeed WMI: 0
1 fanspeed WMI2 error: 0
1 fanspeed WMI2: 500
1 fanspeed WMI3 error: 0
1 fanspeed WMI3: 0
2 fanspeed error: 0
2 fanspeed: 464
2 fanspeed EC error: 0
2 fanspeed EC: 464
2 fanspeed ACPI error: 0
2 fanspeed ACPI: 400
2 fanspeed WMI error: 0
2 fanspeed WMI: 0
2 fanspeed WMI2 error: 0
2 fanspeed WMI2: 400
2 fanspeed WMI3 error: 0
2 fanspeed WMI3: 0
powermode access method: 3
powermode error: 0
powermode: 2
powermode EC error: 0
powermode EC: 0
powermode ACPI error: -5
powermode ACPI: 0
powermode WMI error: 0
powermode WMI: 2
has custom powermode: 1
ACPI rapidcharge error: 0
ACPI rapidcharge: 0
WMI backlight 2 state: 0
WMI backlight 3 state: -14
WMI light IO port: 0
WMI light Y logo/lid: 0
EC minifancurve feature enabled: 1
EC minifancurve on cool: false
EC lockfancontroller error: 0
EC lockfancontroller: false
fanfullspeed error: 0
fanfullspeed: 0
fanfullspeed EC error: 0
fanfullspeed EC: 0
EC fan curve current point id: 4
EC fan curve points size: 9
Current fan curve in hardware:
rpm1|rpm2|acceleration|deceleration|cpu_min_temp|cpu_max_temp|gpu_min_temp|gpu_max_temp|ic_min_temp|ic_max_temp
0	 0	 2	 2	 0	 48	 0	 48	 127	 127
500	 400	 2	 2	 45	 54	 45	 54	 127	 127
500	 400	 2	 2	 51	 58	 51	 58	 127	 127
500	 400	 2	 2	 55	 62	 55	 62	 127	 127
500	 400	 2	 2	 59	 71	 59	 71	 127	 127
1200	 1100	 2	 2	 68	 76	 68	 76	 127	 127
1200	 1100	 2	 2	 72	 81	 72	 81	 127	 127
1200	 1100	 2	 2	 78	 90	 78	 90	 127	 127
1200	 1100	 2	 2	 87	 127	 87	 127	 127	 127
=====================
Current fan curve in hardware (WMI; might be empty)
rpm1|rpm2|acceleration|deceleration|cpu_min_temp|cpu_max_temp|gpu_min_temp|gpu_max_temp|ic_min_temp|ic_max_temp
=====================

mage

I found that regardless of how the temperature changes, the locking fan controller in https://github.com/johnfanv2/LenovoLegionLinux is effective.

I think I've found the answer. When downgrading the BIOS to GKCN24WW, the fan curve is normal and does not lock at 2300rpm when the GPU load increases. However, when downgrading to GKCN24WW, I found that the Nvidia driver cannot be installed, but it can be installed properly with GKCN64WW.

Locking the fan speeds of fan1 and fan2 at 1000rpm with the BIOS GKCN24WW is feasible, but when the GPU load increases, the fans will produce a significant electrical noise.

Well, that's unfortunate. It's making me consider buying a ROG laptop because it has Armoury Crate and G helper.😭😭😭