NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to change power limit with nvidia-smi

machinedgod opened this issue · comments

NVIDIA Open GPU Kernel Modules Version

530.41.03-1

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Arch Linux

Kernel Release

Linux 6.2.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 22 Mar 2023 22:52:35 +0000 x86_64 GNU/Linux

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 2060 (UUID: GPU-2f685ce6-33f4-db75-ce05-81d1723a6ddb)

Describe the bug

Before recent update, I was able to execute:
~$ sudo nvidia-smi --power-limit 60

and have it work as expected.

After update, this is the output:

Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.

Changing power limits caused observable changes both in temperature and in performance, so I am pretty sure my GPU supports it.

For context:
I found that default power limit of 80W tends to heat up the GPU enough that it starts throttling itself and cause stuttering - 60W seemed to work perfectly and keep it under 63C during everything, without boosting my fan. The computer itself is a laptop (which explains issues with heat dissipation), a Lenovo Legion Y740, and I have a pretty good cooling pad to help out.

To Reproduce

~# nvidia-smi --power-limit 60

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

Just a bit more context:
you may notice that in my Xorg config, I use option to skip EDID check for HDMI-0 output - the reason why, is because either driver doesn't recognize my (about 7y old) monitor, or the checksum that monitor outputs is invalid, and then it wouldn't let me use FullHD resolution on that monitor.

No idea if this is in any way related (I presume its) not, but this setup worked for over 5-6 months, so I doubt its related.

I also cannot change the power limit with nvidia-smi on this driver while i can on 525 series (I have a RTX 3050 laptop)

On 530 it says "Changing power limit is not supported for this GPU)

So which is intended behavior , make up your mind NVIDIA

I really hope its not the ladder because of this fun issue which also happens to me #435

commented

Relevant discussion here

Oh so its a wide-reaching problem, not just for 2060 and 3050.
I suppose visibility on it is nice enough, so should probably be fixed soonish.

With 530.30.02 (beta) the power limits were working fine for the first time on my laptop with a RTX 2060 Mobile 6GB. It broke after 530.42.03 update.

Using the 525 version of the driver, my RTX3060 is able to limit power for the first time, but since updating the 530 version of the driver, it has been impossible to limit power

If anybody here is using a lenovo gaming laptop with the 530 drivers
Can anyone confirm this other power limit issue #492

got same issue

on a 3060 AMD legion 5

Legion S7 15ACH6 / AMD Ryzen 7 5800H / NVIDIA GeForce RTX 3060

Same issue

driver version?

Legion S7 15ACH6 / AMD Ryzen 7 5800H / NVIDIA GeForce RTX 3060

Same issue

driver version?

530.41.03-15

driver version?

530.41.03-15

try 535 beta

unless it doesn't exist for okm then test proprietary

Still have the issue on 535 BETA proprietary which is the only one with the "fix" i think

test with 525 branch (prop, or okm if exists)

commented

Ryzen 5 5600H / NVIDIA GeForce RTX 3060 Laptop
Same issue
on version 525.78.01 everything works fine

Ryzen 5 5600H / NVIDIA GeForce RTX 3060 Laptop Same issue on version 525.78.01 everything works fine

did you test 535 (prop)

commented

@barokvanzieks Yesterday I checked the work of the power limit on many versions of the drivers.
(Ryzen 5 5600H / NVIDIA GeForce RTX 3060 Laptop / Ubuntu 20.04)
(command: nvidia-smi -pl 50)

  • 525.78.01 power limit works fine
  • 525.85.05 power limit works fine
  • 525.89.02 power limit NOT CHECKED
  • 525.105.17 power limit works fine
  • 530.41.03 power limit is NOT WORKING
  • 535.43.02 power limit is NOT WORKING

it turns out the power limit does not work since the version: 53X.XX.XX

hmmmmm ok i havent tested 535 myself, guess ill stay on 525 for now

i checked latest version of nvidia driver "535.54.03" with dkms and still NOT WORKING

(i5-12500H / NVIDIA GeForce RTX 3060 Laptop / EndeavourOS)

I think this is intended at this point

They also unlocked it on windows around driver 528 and locked it back shortly after on laptops so maybe this is only for dekstop GPU's

commented

@kleidiss This is a very important opportunity. Laptops get very hot and drain the battery quickly. This feature is required!

I agree. It would be worthwhile checking if anyone diffed the sources and figured out how to make a patch to enable it back. If not, I will attempt that this weekend, because right now - my laptop goes to 80-81C (with a REALLY good cooling pad)... previously I kept the framerate stable at 55C

I just noticed it did not work anymore on my system, reverting back to 525.105.17 and everythings good. Thanks you

The laptop is a TongFang GM5MG0O (bios N.1.09A08) // i7 10875H // RTX3070 (bios 94.04.3F.00.83)
The OS is Linux Mint Linux Mint 21.1 // kernel 6.3.8-x64v3-xanmod1

I agree. It would be worthwhile checking if anyone diffed the sources and figured out how to make a patch to enable it back. If not, I will attempt that this weekend, because right now - my laptop goes to 80-81C (with a REALLY good cooling pad)... previously I kept the framerate stable at 55C

thatd be super cool of you

but yes 535 isnt workinf for me either and its a pain in the arse since now i have to fuck with drivers to get OpenCL working

this feature is essential for me since i can up wattage by like 50 watts from stock 80

I agree. It would be worthwhile checking if anyone diffed the sources and figured out how to make a patch to enable it back. If not, I will attempt that this weekend, because right now - my laptop goes to 80-81C (with a REALLY good cooling pad)... previously I kept the framerate stable at 55C

question is which file would it be in...

Any updates? When is it going to work again?

@machinedgod you were going to make a patch, no?

Any updates? When is it going to work again?

535 was supposed to fix it but it didnt so nobody knows

commented

I have bad news, I hope this bug will be fixed in the future.
report

fucking hell, guess we'll need a patch of our own

@machinedgod you were going to make a patch, no?

I was, but weekend had its own challenges, unfortunately this - as much as its a priority for me - is not high enough priority to override everything else.

I'll have to cram it into my schedule when I get some breathing space, sorry gents.

The thing with Nvidia is some major BS, tho 😠

@machinedgod you were going to make a patch, no?

I was, but weekend had its own challenges, unfortunately this - as much as its a priority for me - is not high enough priority to override everything else.

I'll have to cram it into my schedule when I get some breathing space, sorry gents.

The thing with Nvidia is some major BS, tho 😠

i see

It's summer and laptops get hotter too. Being able to tweak the temperature of the GPUs is a necessity.

commented

And what about gaming laptops which change power limits on Windows in gaming mode and now stuck with low power limit on Linux? Ah, I would like to give one famous quote about NVIDIA from one Finnish guy called Linus, but I won't.

commented

@kuriot On windows power limit change is also not supported... All version 53x.xx can not change power limit on laptops

commented

@arutar I hope it's just a bug and that answer that it's by design on Nvidia forum was just a mistake. But I think they made it on purpose.

And what about gaming laptops which change power limits on Windows in gaming mode and now stuck with low power limit on Linux?

Try to enable nvidia-powerd service, that's enable dynamic boost

So power limit is automatically changed on high load. I tested it with Superposition benchmark: on system startup nvidia-smi show 60W power limit, in the benchmark nvidia-smi show ~90W usage (maybe will be 100W, I just tested ~10 seconds in the bench)

P.S. My config - 3060 RTX (laptop, 100W) + nvidia 535.54.03

still kinda meh

commented

And what about gaming laptops which change power limits on Windows in gaming mode and now stuck with low power limit on Linux?

Try to enable nvidia-powerd service, that's enable dynamic boost

So power limit is automatically changed on high load. I tested it with Superposition benchmark: on system startup nvidia-smi show 60W power limit, in the benchmark nvidia-smi show ~90W usage (maybe will be 100W, I just tested ~10 seconds in the bench)

P.S. My config - 3060 RTX (laptop, 100W) + nvidia 535.54.03

No, it doesn't matter if the service is running or not.

commented

@kraskden The question is to set the upper limit of power, and not rely on automatic mode. You need the ability to specify the limits of power change that will be less than the default ones (from the firmware) (Default 0 - 115) (user 0 - 50) nvidia-smi -pl 50
So you can extend the battery life of the laptop or reduce its heat.

@kraskden The question is to set the upper limit of power

OMG, I replied to specific question and suggest an option to fix problem with low power limit on gaming laptops.

No, it doesn't matter if the service is running or not.

Really? I made tests, without powerd service my GPU stuck at 60W power.

commented

No, it doesn't matter if the service is running or not.

Really? I made tests, without powerd service my GPU stuck at 60W power.

My laptop by default has limit of 115W. Before driver update I could change it up to 140W (it's the true limit) or lower it at night so that laptop worked quieter. Then they added nvidia-powerd.service and setting of higher limit worked only when this service is turned off. Now it doesn't work at all. If the service is turned on, no matter what it doesn't go higher than 115W. And these external 15-25 watts gave me a good FPS boost in heavy games.

i have a more extreme case of this where i have 80w stock limit and 130w as the maximum, so i get a massive performance boost out of this, nvidia-powerd literally does nothing useful for me except break the -pl option for me completely

commented

And what about gaming laptops which change power limits on Windows in gaming mode and now stuck with low power limit on Linux?

Try to enable nvidia-powerd service, that's enable dynamic boost

So power limit is automatically changed on high load. I tested it with Superposition benchmark: on system startup nvidia-smi show 60W power limit, in the benchmark nvidia-smi show ~90W usage (maybe will be 100W, I just tested ~10 seconds in the bench)

P.S. My config - 3060 RTX (laptop, 100W) + nvidia 535.54.03

You were right. On my Lenovo Legion 5 performance modes are switched by Fn+Q and I never bothered to switch them on Linux. But yes, now to use higher power I need to switch the laptop to performance mode with these keys (and it's not connected to system, I guess it's some Lenovo thing hardcoded into BIOS) and with running nvidia-powerd.service I get 140W limit now.

I’m in the same boat, I can’t set my Zephyrus duo GX650PY with an RTX 4090 to anything above 115W either. In the past I could set it to 175W. Now the same has happened on Windows with GPU drivers above 536.23 locking the card to 115W. I ran Timespy on 536.23 and got a GPU score of 21 437. The same benchmark on driver 536.67 gives a score of 17 693… Did anyone find a solution other than rolling back the drivers?

nvidia gtx 1050 ti nvidia-smi -pl 100 dont work in proprietary drivers also. If anyone knows how to save power in gtx 1050 ti please write here.

same with GeForce MX250, wanted to make it cooler but I guess no such option now

Same issue here with propietary drivers 535.98 on RTX 4060 Mobile (i9 13900H, Envy 16).

The same problem.
Power limit is stuck at 35W in Linux, but in Windows works perfectly with 50W+15W turbo (65W as designed).

Ubuntu 23.04, kernel 6.4.11, driver 535.86.05.
Asus ROG Flow Z13 2023 (RTX 4050 mobile)

The same problem.
Power limit is stuck at 35W in Linux, but in Windows works perfectly with 50W+15W turbo (65W as designed).

Ubuntu 23.04, kernel 6.4.11, driver 535.86.05.
Asus ROG Flow Z13 2023 (RTX 4050 mobile)

I've managed to solve this problem by simply enabling the system service/daemon at startup named nvpower

Depending on your distro your mileage may vary.

The same problem.
Power limit is stuck at 35W in Linux, but in Windows works perfectly with 50W+15W turbo (65W as designed).
Ubuntu 23.04, kernel 6.4.11, driver 535.86.05.
Asus ROG Flow Z13 2023 (RTX 4050 mobile)

I've managed to solve this problem by simply enabling the system service/daemon at startup named nvpower

Depending on your distro your mileage may vary.
I needed to disable this service because in fedora it didn't worked.

I've managed to solve this problem by simply enabling the system service/daemon at startup named nvpower

@xAlpharax What package is this a part of on what distro?

The same problem.
Power limit is stuck at 35W in Linux, but in Windows works perfectly with 50W+15W turbo (65W as designed).
Ubuntu 23.04, kernel 6.4.11, driver 535.86.05.
Asus ROG Flow Z13 2023 (RTX 4050 mobile)

I've managed to solve this problem by simply enabling the system service/daemon at startup named nvpower
Depending on your distro your mileage may vary.
I needed to disable this service because in fedora it didn't worked.

Specifically Fedora with Wayland?

@xAlpharax What package is this a part of on what distro?

Void Linux shipped this service daemon with its package of the nvidia drivers.

@xAlpharax They do, eh? 🤔 It appears they package the driver similar to just about any other distro. They must have a separate package, but I can't find it.

@yochananmarqos Looking at the building process it so seems. I will try finding that out as well.

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Arch has an aur package for 525xx drivers in case anybody needs them

Arch has an aur package for 525xx drivers in case anybody needs them

Good to know for the ones with access to the AUR

i just hope someone makes a patch eventually lol

all those "just enable powerd" solutions arent great because i want to use the quiet mode on my legion without capping power, and powerd also breaks manual control completely

@machinedgod hey just reminding you about the patch lol

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Its get easier to buy an amd than nvidia fix something on linux.

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Its get easier to buy an amd than nvidia fix something on linux.

Lol
AMD do not have so powerful software as CUDA/TensorRT/cuDNN. Not only for games people buy videocards :)

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Its get easier to buy an amd than nvidia fix something on linux.

Lol AMD do not have so powerful software as CUDA/TensorRT/cuDNN. Not only for games people buy videocards :)

Yep, but it doesn't works as expected on linux. AMD hadn't CUDA, but have ROCm, and for me, as data scientist works as expected for julia programming. Its not so powerfull, but just works.

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Its get easier to buy an amd than nvidia fix something on linux.

Lol AMD do not have so powerful software as CUDA/TensorRT/cuDNN. Not only for games people buy videocards :)

Yep, but it doesn't works as expected on linux. AMD hadn't CUDA, but have ROCm, and for me, as data scientist works as expected for julia programming. Its not so powerfull, but just works.

Hmm, what kind of expectations? Only power limit? Power limit problem occurs mostly (only?) on laptops (mobile videocards). And rollback to 525 driver solves it (thanks guys for that). I didn't even reinstall CUDA, versions were compatible.

Any other powerful calculations (CUDA parallelism on C++ in my project) and AI models (TF/pytorch) work perfectly on my office Nvidia server with Linux. Even works perfectly on Jetson Nano and TX2. And now without low power limit on my laptop, I could do the same as described above (but in smaller scale) :)

I do not imagine the same with AMD. Ok, OpenACC/OpenCL allows to work with other vendors, but performance/flexibility of CUDA is better

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Its get easier to buy an amd than nvidia fix something on linux.

Lol AMD do not have so powerful software as CUDA/TensorRT/cuDNN. Not only for games people buy videocards :)

Yep, but it doesn't works as expected on linux. AMD hadn't CUDA, but have ROCm, and for me, as data scientist works as expected for julia programming. Its not so powerfull, but just works.

Hmm, what kind of expectations? Only power limit? Power limit problem occurs mostly (only?) on laptops (mobile videocards). And rollback to 525 driver solves it (thanks guys for that). I didn't even reinstall CUDA, versions were compatible.

Any other powerful calculations (CUDA parallelism on C++ in my project) and AI models (TF/pytorch) work perfectly on my office Nvidia server with Linux. Even works perfectly on Jetson Nano and TX2. And now without low power limit on my laptop, I could do the same as described above (but in smaller scale) :)

I do not imagine the same with AMD. Ok, OpenACC/OpenCL allows to work with other vendors, but performance/flexibility of CUDA is better

For me, having a way to control the power of my machine spend in day work is important in fieldwork. As I think is not cool to have a hot box on my lap.
Performance of ROCm its not so below CUDA. Then, its a nice solution. Just get something with fewer bugs as its opensource, and that works without any types of workarounds. I already have an AMD, its very nice. I'm only trying to fix another laptop with an nvidia gtx(thats very frustrating)

to anyone wanting to fix this temporarily at least: roll back drivers to latest in 525xx branch and wait until somebody here finds a fix/makes a patch or nVidia fixes it

Its get easier to buy an amd than nvidia fix something on linux.

oh yea lemme just like, pull a 6600xt out of my ass, it's not like money is a thing, right guys?

you can't dismiss an issue by going "oh yea just buy X thing instead" because people will need Y thing for work, education etc and might also want this bug fixed, also most do not want to sell a laptop purely because nvidia can't fix a bug

if you have nothing useful to say other than "buy amd", kindly screw off.

if you have nothing useful to say other than "buy amd", kindly screw off.

Well, thats tragic. I'm thinking its easier to get another card, because nvidia wont fix it. Its an old problem for linux, well explained by Linus Torvalds some years ago.
We can try to put pressure here, but I think it will not work.

if you have nothing useful to say other than "buy amd", kindly screw off.

Well, thats tragic. I'm thinking its easier to get another card, because nvidia wont fix it. Its an old problem for linux, well explained by Linus Torvalds some years ago. We can try to put pressure here, but I think it will not work.

oh yeah let me just work my ass off to get a new gpu instead of asking nvidia to fix it! do you even read what you're saying?

people buy one computer/laptop to use for years, not to swap parts on a whim

if you have nothing useful to say other than "buy amd", kindly screw off.

Well, thats tragic. I'm thinking its easier to get another card, because nvidia wont fix it. Its an old problem for linux, well explained by Linus Torvalds some years ago. We can try to put pressure here, but I think it will not work.

oh yeah let me just work my ass off to get a new gpu instead of asking nvidia to fix it! do you even read what you're saying?

Where I told you to get a new gpu? I think you are reading and missinterpreting all these posts.
I told "is easier to get a new card than nvidia fix this issue". Not for you or anyone here buy another card.
If you are working for nvidia and feel bad for this I'm sorry, but is the true. Nvidia work for linux devices are very poor. Lacks a lot of things.

if you have nothing useful to say other than "buy amd", kindly screw off.

Well, thats tragic. I'm thinking its easier to get another card, because nvidia wont fix it. Its an old problem for linux, well explained by Linus Torvalds some years ago. We can try to put pressure here, but I think it will not work.

oh yeah let me just work my ass off to get a new gpu instead of asking nvidia to fix it! do you even read what you're saying?

Where I told you to get a new gpu? I think you are reading and missinterpreting all these posts. I told "is easier to get a new card than nvidia fix this issue". Not for you or anyone here buy another card.

theoretically for nvidia it's as easy as flipping a switch, getting them to do it is NOT however
Getting an AMD gpu is not "easier" because money, requirements and time are a constraint while commenting on a github issue is easy

if you have nothing useful to say other than "buy amd", kindly screw off.

Well, thats tragic. I'm thinking its easier to get another card, because nvidia wont fix it. Its an old problem for linux, well explained by Linus Torvalds some years ago. We can try to put pressure here, but I think it will not work.

oh yeah let me just work my ass off to get a new gpu instead of asking nvidia to fix it! do you even read what you're saying?

Where I told you to get a new gpu? I think you are reading and missinterpreting all these posts. I told "is easier to get a new card than nvidia fix this issue". Not for you or anyone here buy another card.

theoretically for nvidia it's as easy as flipping a switch, getting them to do it is NOT however Getting an AMD gpu is not "easier" because money, requirements and time are a constraint while commenting on a github issue is easy

Good look for you trying to protect a corporation that even doesn't know you exists.

if you have nothing useful to say other than "buy amd", kindly screw off.

Well, thats tragic. I'm thinking its easier to get another card, because nvidia wont fix it. Its an old problem for linux, well explained by Linus Torvalds some years ago. We can try to put pressure here, but I think it will not work.

oh yeah let me just work my ass off to get a new gpu instead of asking nvidia to fix it! do you even read what you're saying?

Where I told you to get a new gpu? I think you are reading and missinterpreting all these posts. I told "is easier to get a new card than nvidia fix this issue". Not for you or anyone here buy another card.
If you are working for nvidia and feel bad for this I'm sorry, but is the true. Nvidia work for linux devices are very poor. Lacks a lot of things.

i am not working for nvidia, however it is very hard to get an amd dgpu laptop here and " its easier to get amd " is not helpful for anybody here, i literally just want to have an issue fixed, not told "just buy amd lol its easier" as its not helpful to anybody here, if you have the money, time and AMD gpus work for your needs, good for you, they don't for most people here/they can't just go out and get an amd dgpu laptop. if you have nothing to contribute to discussion of THIS ISSUE, stop

i am not defending nvidia, i literally just want an issue fixed

and your comments are NOT helpful in any way, again, if AMD works for you, good. it doesn't for other people here

Guys this isn't a place for this kind of discussion.
Take it to PMs or Imma gonna lock this thread that could be useful to keep track of the progress on this feature.

Edit:
as for the patch, given that winter is coming and I'm gonna spend much more time at home, its highly likely I'll try and tackle this. Busy summer, busy autumn.

Thanks for the information, in this case I used this for my Windows, I tested version 528.49-notebook-win10-win11-64bit-international-dch-whql Release Date:2023.2.8 on my notebook and the power limit worked again, I also tested 531.18 Release Date:2023.2.28 didn't work, I'm not good at English, sorry

If it worked before, it was a bug.

Now it doesn't work because NVIDIA drivers obey the OEM set limits. Take it to the OEM.

Considering most Linux users run with Secure Boot disabled, someone could probably patch the kernel module or firmware to override this.

I have an Acer Laptop (3070 TI) and on Windows it works on latest drivers as it should, 140W on normal mode and 150W on extreme and turbo, but on Linux I can't select turbo or extreme, so I'm stucked with 115W or with Dynamic Boost 130W, but no more than that. Change the GPU Power was very nice to fix this problem. 525 looks to be my driver I guess.

525 needs you to compile ffmpeg with older nvcodec headers now, ugh

[Pointing8422] because of that I updated my VBIOS 1 year ago with a custom one from TPU in section Video BIOS Collection → unnoficial uploads. There are 2 or 3 bioses with normal limit 125W (Default Power Limit) and when powerd working up to 150W (Max Power Limit). Also I disable AMD SmartShift in notebook BIOS to prevent GPU from throttling. Set max frequency limit to 1830 MHz and overclock +160 MHz (w/o limit 1830 MHz Xserver became unstable at +110 MHz in games). Now notebook really use most of its potential except we still can't control overclock table, I hope this will be changed this year. And this all on latest driver.
IDK but maybe you forgot to put nvidia-dbus.conf from 545 driver docs to /etc/dbus-1/system.d?
And afterall power limit really not working, it accept value in 525, but never really limit it. Now in 545 there is a message Changing power management limit is not supported in current scope.

With driver version 535.154.05 on Ubuntu 22.04.3, I can

nvidia-smi --power-limit=150

and reboot to make it take effect.

Tried both sudo nvidia-smi --power-limit=100 and sudo nvidia-smi -pl 100, on 525, 535, 545 and the newest 550, still seems that the nvidia-driver-525 is the only one working

System specs:
OS: Ubuntu 22.04.3
Graphics: GeForce RTX 3060

On other verions than nvidia-driver-525, I am still getting this error:

Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.

Did anyone manage to find any workaround for this?

With driver version 535.154.05 on Ubuntu 22.04.3, I can

nvidia-smi --power-limit=150

and reboot to make it take effect.

How have you done this?
I'm also on Ubuntu 22.04 (Linux Mint 21.3) and using the driver Version 535.154.05. Kernel: 5.15.0-94-generic x86_64
NVIDIA GeForce RTX 2080 (Mobile)

sudo nvidia-smi --power-limit=100
Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.

On Fedora 39 with 545.29.06 drivers, Kernel cachyos-6.7.4-cb1.0 or 6.7.4-200 and nvidia-gpu-firmware 20240115, still can't.

❯ doas nvidia-smi --power-limit=85
Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.

Adding another datapoint to the mix. Unable to set power limit, GPU stuck at 10 watts max.

Arch Linux, kernel 6.6.6-arch1-1-surface, nvidia-dkms-open version 545.29.06-02.

Downgrading to the last release of nvidia-dkms-open version 525 immediately allowed me to set the power limit past 10 watts.

Adding another datapoint to the mix. Unable to set power limit, GPU stuck at 10 watts max.

Arch Linux, kernel 6.6.6-arch1-1-surface, nvidia-dkms-open version 545.29.06-02.

Downgrading to the last release of nvidia-dkms-open version 525 immediately allowed me to set the power limit past 10 watts.

If this is a Surface Laptop Studio, I found that disabling Runtime D3 or preventing the GPU from entering the D3cold state will keep the GPU at the default TDP of 35W even on driver versions past 525. Obviously this is still not ideal as the 15W of Dynamic Boost is missing (nvidia-powerd does not help). It's also interesting to note that this 10W power limit issue does not happen with Runtime D3 enabled on 525 but it does happen on future releases. (Both on default kernel and Surface Linux kernel)

Lastly it seems like this whole thing, at least on Surface devices, is mainly because their battery and the power systems are handled via the proprietary Surface Aggregator Module and not the standard ACPI interface (you can verify by seeing that acpi_listen will not spit out power events when you unplug and replug the power adapter) which the NVIDIA driver relies on to know if the device is connected to power or running on battery power. You'll probably see that in the PowerMizer section of the NVIDIA X Server Settings that once the GPU wakes out of D3cold the Power Source readout will display either 'Error' or 'Undersized' which will lock the power limit to 10W, and it can't be fixed unless you reboot and prevent the GPU from reentering D3cold. (Also it'll the Power Source readout will always display 'AC' even if you're on battery power if you disable Runtime D3)

This entire issue can of course be avoided entirely by sticking to the 525 driver and manually readjusting the power limit.

If this is a Surface Laptop Studio, I found that disabling Runtime D3 or ...

THANK. YOU.

This has been plaguing me for at least half a year now. I thought the massive difference in CUDA performance I saw was because I switched from an X server to Wayland around that time.

Models are getting easily 20 times faster training iterations. Thank you so much @BlueCreeper512 .

Sticking to the 525 driver and manually readjusting the power limit seems not working for some games. Forza Horizon 5 will stuck after playing for some time and the power reported by nvidia-smi is only 8 Watts.

Late update. 550.67 works great with Nvidia-powerd enabled on Ubuntu. My mobile 4060 usually has a power limit of 115W or 140W and it is adjusted dynamically. No problem with games anymore.

However, for this issue, nvidia-smi still cannot change the power limit.

image
image
image

Interesting, on my side I'm more interested in having the power limit set to 175W (4090 laptop) at all times (I'm not using my card for games but for deep learning and it gets stuck to 115W if I use anything other than drivers 525. No idea how it's able to dynamically adjust with games tho.