NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.

Home Page:https://nvidia.github.io/earth2mip/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ›[BUG]: Not able to load models on CPU

adamjstewart opened this issue Β· comments

Version

0.1.0

On which installation method(s) does this occur?

Source

Describe the issue

The default behavior of earth2mip.networks.get_model(...) is to load the model on the CPU. However, the default behavior does not work:

>>> from earth2mip.networks import get_model
>>> m = get_model("e2mip://dlwp")
/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-ecmwflibs-0.6.1-ir2ftnwp2o644qfwe3pxs46i7xznemce/lib/python3.10/site-packages/ecmwflibs/__init__.py:144: UserWarning: ecmwflibs: using provided 'ECMWFLIBS_ECCODES' set to '/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/eccodes-2.34.0-5mcekl7wxd72hrvaq7pmv4lw7cyoqoou/lib/libeccodes.so
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/__init__.py", line 345, in get_model
    return _load_package_builtin(package, device, name=url.netloc)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/__init__.py", line 291, in _load_package_builtin
    return inference_loader(package, device=device)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-earth2mip-0.1.0-eg37aficdacia2bylnpyisnznioeuoo2/lib/python3.10/site-packages/earth2mip/networks/dlwp.py", line 155, in load
    with torch.cuda.device(device):
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-torch-2.2.1-yni5thh74mlihxkm4dm7ap5wtafhornn/lib/python3.10/site-packages/torch/cuda/__init__.py", line 370, in __init__
    self.idx = _get_device_index(device, optional=True)
  File "/home/adam/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/py-torch-2.2.1-yni5thh74mlihxkm4dm7ap5wtafhornn/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
    raise ValueError(f"Expected a cuda device, but got: {device}")
ValueError: Expected a cuda device, but got: cpu

It seems like the models are hard-coded to be loaded on CUDA, even if no CUDA device is available (CPU, ROCm, MPS, etc.).

Environment details

Linux Ubuntu 22.04

Thanks. I agree this seems like a bug in the dlwp loader.

I'm not sure if it is broken for other models.

cc @ktangsali @NickGeneva