gimli-org / gimli

Geophysical Inversion and Modeling Library :earth_africa:

Home Page:https://www.pygimli.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyGIMLi and NUM_THREADS

prisae opened this issue · comments

Problem description

I can try to ensure that each of my processes only use one thread by setting my environment variables accordingly,

export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1
export NUMBA_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export NUM_THREADS=1

This works fine. UNTIL I do a simple import pygimli - this fiddles with my settings, and sets my processes use several hundred % of CPU, which can be annoying on shared clusters.

Your environment

Please provide the output of print(pygimli.Report()) here. If that does not
work, please give provide some additional information on your:

Operating system: Linux (RHEL 8.9)
Python version: e.g. 3.9, 3.10, etc.?
pyGIMLi version:

--------------------------------------------------------------------------------
  Date: Thu Jun 06 10:30:59 2024 CEST

                OS : Linux
            CPU(s) : 256
           Machine : x86_64
      Architecture : 64bit
               RAM : 1006.8 GiB
       Environment : Jupyter
       File system : ext4

  Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37)
  [GCC 12.3.0]

           pygimli : 1.4.5
            pgcore : 1.4.0
             numpy : 1.24.4
        matplotlib : 3.8.2
             scipy : 1.11.4
              tqdm : 4.66.1
           IPython : 8.18.1
           pyvista : 0.43.1

  Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613
  for Intel(R) 64 architecture applications
--------------------------------------------------------------------------------

Way of installation: conda

Steps to reproduce

Set your environment variables of all *NUM_THREADS to one, and observe that with pygimli it uses more than 100% CPU.

Expected behavior

I would expect either of

  • pygimli respecting user defined env variables
  • pygimli indicating when changing env variables, and providing a way to disable it

This circumvents the issue (mostly):

import pygimli as pg
pg.setThreadCount(1)

but still, I would not expect an import to mess with my variables.

For conveniance reasons, the core extension of pygimli sets OPENBLAS_NUM_THREADS to number of cpu -2 right on initializing. You can change it back after importing pygimli with pg.setThreadCount(1)

We maybe could change this that he only sets this environment variable, if its not already specified by the user?

I see. But when the number of CPU is 256 on a shared cluster, that is a very inconvenient default IMHO.

I would prefer what you say afterwards. IF there is a user set env variable, it should be respected.

And maybe also a MAX (no need to set the nthread to 254).

Change default behaviour. Will be live after the next core update.

export OPENBLAS_NUM_THREADS=12 && python -c 'import pygimli as pg; print(pg.core.threadCount())'
12
unset OPENBLAS_NUM_THREADS && python -c 'import pygimli as pg; print(pg.core.threadCount())'
16

Great, like it, thanks for the quick turnaround! 8 or 16 was exactly what our HPC expert here also suggested as maximum, as openblas won't be much efficient beyond that number.