magnusja / nvfan

A python library to control nvidia gpu fan to use more aggressive cooling for instance for DL applications. Use at your own risk.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nvfan

A module to control Nvidia Graphic Cards' fan within python. Features:

  • set constant fan speed
  • set more aggressive fan schedule compared to stock to avoid overhaeting when doing Deep Learning or other computationally intense tasks

Why?

My deep learning rig contains 2 GTX 1080ti graphic cards with no liquid cooling. It takes only a few minutes for the GPUs to hit the thermal threshold of 86°C after I start a training process. Yet, it only uses fans at 50% rate.

This module uses a more aggressive fan speed and therefore avoids overheating, and thus throttling of GPU frequency at around 90° Celsius.

What is special about it?

You only have to add two to three lines to your main Deep Learning python script and then the fan speed is adjusted to keep GPU temperature at max 80° Celsius. When you Deep Learning pipeline exists, the control of the fann speed is automatically given back to the nvidia driver. Hence, fan speed is significantly reduced when finished to reduce noise. Another option is to start nvfan using systemd (see below).

How to use it?

Controlling nvidia gpu fan requires an X server to be running. To run X without having a monitor attached to the system requires special config.

Preparation

Setup x config in a shell like below. You may need to use sudo.

$ nvidia-xconfig --enable-all-gpus --cool-bits=7 --connected-monitor=Monitor0 --allow-empty-initial-configuration --force-generate

Warning: we used --force-generate flag. A backup of your previous config is saved and is reported as the result of running this function.

Aa manual configuration could look like this::

$ cat /etc/X11/xorg.conf.d/nv.conf 
# start

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    Option "Coolbits" "4"
EndSection
# trail

See also https://wiki.archlinux.org/index.php/NVIDIA/Tips_and_tricks

Run X

I think the best way is to use xinit:

$ xinit &

Leave this step out if you are using a desktop environment like Gnoome, KDE, or similar.

Dependencies

Please make sure nvdia-smi and nvidia-settings are installed. The latter usually needs to be installed manually whereas the former is usually included in the nvidia driver package.

Install nvfan

$ pip install nvfan

Usage

You can use command line script:

$ nvfan constant -g 0 -s 60  # sets a constant speed at 60%

Or in your python script:

import nvfan

first_gpu = 0
nvfan.constant(first_gpu, 60)

The above script, puts GPU 0 in constant mode with 60% speed. You can use aggressive or driver modes too:

second_gpu = 1

# In aggressive mode, a small increase in temperature causes a large increase in fan speed.
nvfan.aggressive(second_gpu)

# Give control back to the driver manually. Please note that after execution is finished, this line is automatically called so you don't have to.
nvfan.driver(first_gpu)
nvfan.driver(second_gpu)

Instead of using the module you can use the GPU class to have more control (i.e. setting custom X11 display, if not set DISPLAY environment variable is used, or if not set, :0 is used as fallback)

import nvfan

gpu = gpufan.GPU(0, display=":1")  # or use default `None` for automatic lookup of display
gpu.aggressive()

You can also omit the first parameter (device) like so:

import nvfan

gpu = gpufan.GPU()  # or use default `None` for automatic lookup of display
gpu.aggressive()

Then all available GPUs are set to aggressive speed.

Decorators

As another syntactic sugar you can annotate functions, which will set the constant/aggressive speed before calling the decorated method and as soon as it returns will give the control back to the nvidia driver:

import time

from nvfan.decorators import constant, aggressive

@constant(percentage=95)
def main():
    time.sleep(60)


@aggressive()
def main_agg():
    time.sleep(60)


if __name__ == '__main__':
    main()
    main_agg()

systemd service

Create a new service file: sudo vim /etc/systemd/system/nvfan.service

And paste the following content:

[Unit]
Description=aggressive GPU fan speed
StartLimitIntervalSec=0

[Service]
Type=simple
Restart=always
RestartSec=1
User=gdm
Environment=XAUTHORITY=/run/user/121/gdm/Xauthority
ExecStart=/usr/local/bin/nvfan -g 0 1 2 3 -- aggressive
ExecStop=/usr/bin/killall nvfan

[Install]
WantedBy=multi-user.target

This runs an aggressive schedule for GPUs 0 to 3, adapt this to your needs.

If you are running a desktop environment like Gnome or KDE: You need to adapt Environment=XAUTHORITY=/run/user/121/gdm/Xauthority to whatever the user id of your gdm or kdm user is. To do so run the following command:

ps a | grep X

Then you can start the service:

systemctl start nvfan

Make sure it is running (consider using nvtop to check that fan speeds are actually affected):

ps a | grep nvfan

Then if everything works, enable the nvfan.service so it starts automatically at boot:

systemctl enable nvfan.service

About

A python library to control nvidia gpu fan to use more aggressive cooling for instance for DL applications. Use at your own risk.

License:MIT License


Languages

Language:Python 100.0%