[BUG]When the load and temperature of the NVIDIA GPU increase, the fan speed lock at 2300 rpm
rango886 opened this issue · comments
Problem Description
When the load and temperature of the NVIDIA GPU increase, the fan speed does not behave as expected.
Environment
Distribution: Ubuntu 23.10
Model name: Legion 7 16ACHg6 2021 / R9000k
CPU model: AMD Ryzen 9 5900HX
GPU model: NVIDIA RTX 3080
CPU stress test: sudo apt install sysbench
GPU stress test: https://www.geeks3d.com/gputest/
Problem Reproduction
The fan configuration is as follows:
According to the readings from Psensor, at the current temperature, the fan1 and fan2 curves are normal.
CPU Stress Test
sudo apt install sysbench
sysbench --threads=16 cpu run
When the CPU temperature increases, the fan1 and fan2 curves remain normal, with fan1 at 1200 rpm and fan2 at 1100 rpm.
AMD Integrated GPU Stress Test
Download from here: https://www.geeks3d.com/gputest/
./start_furmark_windowed_1024x640.sh
When the temperature of the AMD GPU increases, the fan1 and fan2 curves remain normal, with fan1 at 1200 rpm and fan2 at 1100 rpm.
NVIDIA GPU Stress Test
# Switch to the NVIDIA GPU and restart
sudo prime-select nvidia
./start_furmark_windowed_1024x640.sh
When the temperature of the NVIDIA GPU increases, neither the fan1 nor fan2 curves behave normally, with both running at 2300 rpm.
Speculation
I have tried:
- Enabling and disabling the mini curve
- Switching between different BIOS versions (GKCN64WW, GKCN60WW, GKCN59WW, GKCN58WW, GKCN57WW)
- Dedicated GPU mode and hybrid mode
None of these have resolved the issue. My speculation is that there might be three fan curves in the EC, corresponding to fan1, fan2, and potentially a fan3 for the NVIDIA GPU. If only the fan1 curve (CPU) and fan2 curve (AMD GPU) are set, without a corresponding fan3 curve (NVIDIA GPU), the fan speed may not behave as expected when the load and temperature of the NVIDIA GPU increase.
I am unsure how to verify the existence of an NVIDIA GPU fan curve.
Debug log
EC Chip ID: 8227
EC Chip Version: 2a4
legion_laptop features: fancurve powermode platformprofile platformprofilenotify minifancurve
legion_laptop ec_readonly: 0
ACPI CFG error: 0
ACPI CFG: 2081289482
temperature access method: 1
CPU temperature error: 0
CPU temperature: 60
CPU temperature EC error: 0
CPU temperature EC: 60
CPU temperature ACPI error: 0
CPU temperature ACPI: 60
CPU temperature WMI error: 0
CPU temperature WMI: 0
CPU temperature WMI2 error: 0
CPU temperature WMI2: 60
CPU temperature WMI3 error: 0
CPU temperature WMI3: 0
GPU temperature error: 0
GPU temperature: 55
GPU temperature EC error: 0
GPU temperature EC: 55
GPU temperature ACPI error: 0
GPU temperature ACPI: 55
GPU temperature WMI error: 0
GPU temperature WMI: 0
GPU temperature WMI2 error: 0
GPU temperature WMI2: 55
GPU temperature WMI3 error: 0
GPU temperature WMI3: 0
fan speed access method: 1
1 fanspeed error: 0
1 fanspeed: 551
1 fanspeed EC error: 0
1 fanspeed EC: 551
1 fanspeed ACPI error: 0
1 fanspeed ACPI: 500
1 fanspeed WMI error: 0
1 fanspeed WMI: 0
1 fanspeed WMI2 error: 0
1 fanspeed WMI2: 500
1 fanspeed WMI3 error: 0
1 fanspeed WMI3: 0
2 fanspeed error: 0
2 fanspeed: 464
2 fanspeed EC error: 0
2 fanspeed EC: 464
2 fanspeed ACPI error: 0
2 fanspeed ACPI: 400
2 fanspeed WMI error: 0
2 fanspeed WMI: 0
2 fanspeed WMI2 error: 0
2 fanspeed WMI2: 400
2 fanspeed WMI3 error: 0
2 fanspeed WMI3: 0
powermode access method: 3
powermode error: 0
powermode: 2
powermode EC error: 0
powermode EC: 0
powermode ACPI error: -5
powermode ACPI: 0
powermode WMI error: 0
powermode WMI: 2
has custom powermode: 1
ACPI rapidcharge error: 0
ACPI rapidcharge: 0
WMI backlight 2 state: 0
WMI backlight 3 state: -14
WMI light IO port: 0
WMI light Y logo/lid: 0
EC minifancurve feature enabled: 1
EC minifancurve on cool: false
EC lockfancontroller error: 0
EC lockfancontroller: false
fanfullspeed error: 0
fanfullspeed: 0
fanfullspeed EC error: 0
fanfullspeed EC: 0
EC fan curve current point id: 4
EC fan curve points size: 9
Current fan curve in hardware:
rpm1|rpm2|acceleration|deceleration|cpu_min_temp|cpu_max_temp|gpu_min_temp|gpu_max_temp|ic_min_temp|ic_max_temp
0 0 2 2 0 48 0 48 127 127
500 400 2 2 45 54 45 54 127 127
500 400 2 2 51 58 51 58 127 127
500 400 2 2 55 62 55 62 127 127
500 400 2 2 59 71 59 71 127 127
1200 1100 2 2 68 76 68 76 127 127
1200 1100 2 2 72 81 72 81 127 127
1200 1100 2 2 78 90 78 90 127 127
1200 1100 2 2 87 127 87 127 127 127
=====================
Current fan curve in hardware (WMI; might be empty)
rpm1|rpm2|acceleration|deceleration|cpu_min_temp|cpu_max_temp|gpu_min_temp|gpu_max_temp|ic_min_temp|ic_max_temp
=====================
I found that regardless of how the temperature changes, the locking fan controller in https://github.com/johnfanv2/LenovoLegionLinux is effective.
I think I've found the answer. When downgrading the BIOS to GKCN24WW, the fan curve is normal and does not lock at 2300rpm when the GPU load increases. However, when downgrading to GKCN24WW, I found that the Nvidia driver cannot be installed, but it can be installed properly with GKCN64WW.
Locking the fan speeds of fan1 and fan2 at 1000rpm with the BIOS GKCN24WW is feasible, but when the GPU load increases, the fans will produce a significant electrical noise.
Well, that's unfortunate. It's making me consider buying a ROG laptop because it has Armoury Crate and G helper.😭😭😭