The Simplest Way to Control Nvidia GPU Fan Speed in Linux
github.comSimplest? Perhaps.
Stupidest? Definitely.
What one really wants is a target temperature. There are two pieces I'd use here:
- Baseline open-loop controller based on power usage (integrated over some time; e.g. past 30 seconds)
- Closed-loop controller for fine-tuning with feedback from measured temperature to fan speed (PID is fine, but the integral term is critical)
The latter can be used to periodically calibrate the former.
You do not want a simple mapping of temperature to fan speed.
The odd issue about the above is that done wrong, continuously-changing fan speeds can be upsetting to users. Once can use a discrete approximation with hysteresis to avoid that.
I wrote a very simple script that did linear scaling between 36 to 70c, and it did a fantastic job of keeping my GPU at reasonable temps. It really doesn't need to be that complicated to function well.
For some meanings of "well."
Setting fan speed to max would also do a "fantastic job of keeping my GPU at reasonable temps."
Linear scaling means your fan is not going fast enough at high temps, and is going too fast at low temps. If your GPU is at 30C, the fan should slow down. It will be at 45C instead, which has no effect on ageing. Likewise, if your target is 45C, your fan isn't going nearly fast enough at high power.
Is there something wrong with the GPU fan speeds in stock configuration?
Yes, on some cards.
I have workstations with dual RTX A6000 that run far too hot (~85 C) and drivers crash with PCIe errors. These are dual slot cards with blower fans, so they are loud, and it seems that NVIDIA chose to let the silicon get extremely hot to keep the noise down. NVIDIA's silicon might be rated to run super hot but the motherboard isn't, and signal integrity apparently suffers.
It's definitely an issue with the cards because the case has excellent airflow and the cards are comfortably spaced.
My solution is to force fan speeds using nvidia-settings. And nvidia-settings requires X11 to run, which is kind of a ridiculous solution for setting fans on headless systems.
I just use MSI afterburner