RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on WSL2 Ubuntu24.04

Question

RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on WSL2 Ubuntu24.04

cospotato opened this issue 9 months ago · comments

Hi, i am new to deep learning. It's work on Windows with CUDA 12.5 and cudnn 9.3.0. Then i tried to run on WSL2 with config belowing get the error RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH in WSL2 Ubuntu24.04. What i have missing ?

OS: WSL2 Ubuntu24.04
Kernel: Linux cospotato 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PyTorch Version: 2.5.1
CUDA version: 12.6
cudnn version: 9.3.0

Traceback:

Traceback (most recent call last):
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/diarize.py", line 199, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
    self._init_msdd_model(cfg)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
    instance = class_.restore_from(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
    instance = cls._save_restore_connector.restore_from(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
    loaded_params = self.load_config_and_state_dict(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
    instance = instance.to(map_location)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
    self._init_flat_weights()
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

CosPotato · Answer 1 · Sat Dec 07 2024 19:09:10 GMT+0800 (China Standard Time)

Additional: If i run NeMo MSDD diarization model section alone. It works. Maybe conflict NeMo conflicted with Whisper ?

John · Answer 2 · Thu Dec 12 2024 09:43:09 GMT+0800 (China Standard Time)

@cospotato did you manage to work this out? because I am having exactly the same issue on a RHEL 9 system RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on a bare metal server.

System
OS: Red Hat Enterprise Linux release 9.4 (Plow)
Kernel: 5.14.0-427.35.1.el9_4.x86_64
GPU: Nvidia A30 24GB
CUDA: 12.4.r12.4/compiler.34097967_0
cuDNN: 9.6.0.74
Python: Python 3.12.1 running in venv
torch: 2.5.1

Traceback:

Traceback (most recent call last):
  File "/srv/whisperAI/whisper-diarization/diarize.py", line 202, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
    self._init_msdd_model(cfg)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
    instance = class_.restore_from(
               ^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
    instance = cls._save_restore_connector.restore_from(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
    loaded_params = self.load_config_and_state_dict(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
    instance = instance.to(map_location)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
    self._init_flat_weights()
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

Clearly this is a CUDA issue but I cannot work out what is going on. I assume it is a pyTorch thing

John · Answer 3 · Thu Dec 12 2024 11:05:41 GMT+0800 (China Standard Time)

OK quick update... diarize.py -a audio.MP3 is still causing the issue above. HOWEVER, diarize_parallel.py -a audio.MP3 runs and transcribes the autio to text and srt with a good level of activity. BUT the speaker identification does not work. I don't know if that helps or confuses things but thought I would share it.

John · Answer 4 · Thu Dec 12 2024 12:54:17 GMT+0800 (China Standard Time)

EDIT: I think this post below is actually a just a set of warnings and is unrelated to diarize.py not running on linux

@cospotato just out of interest did you get a warning directly before this error when calling diarize.py about tarfile.py:2252 not being allowed to use absolute paths anymore?

[NeMo W 2024-12-12 15:17:10 nemo_logging:393] /usr/lib64/python3.12/tarfile.py:2252: RuntimeWarning: The default behavior of tarfile extraction has been changed to disallow common exploits (including CVE-2007-4559). By default, absolute/parent paths are disallowed and some mode bits are cleared. See https://access.redhat.com/articles/7004769 for more details.
      warnings.warn(

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: true

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
    Validation config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    seq_eval_mode: false

sadathknorket · Answer 5 · Thu Dec 12 2024 17:14:08 GMT+0800 (China Standard Time)

same issue , is this resolved ? @DrJPK @cospotato

John · Answer 6 · Thu Dec 12 2024 20:04:20 GMT+0800 (China Standard Time)

@sadathknorket not resolved but for some reason that I can't quite explain, the diarize_parralel.py script runs without this error for me. Unfortunately, that parallel script seems to label everything as speaker 0 so it's not working perfectly but it is transcribing and completing. I'm thinking something upstream with NeMo has changed causing this issue.

juntatalor · Answer 7 · Mon Jan 20 2025 20:26:07 GMT+0800 (China Standard Time)

Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:

pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

pip throws dependency error for pytorch....:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.

... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py

Glenn Dekhayser · Answer 8 · Sun Feb 09 2025 01:51:28 GMT+0800 (China Standard Time)

Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:

pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

pip throws dependency error for pytorch....:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.

... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py

I can verify I had the SAME issue, applied the "fix" here, got the SAME error but got the same successful install, and the cuDNN mismatch was resolved. Very weird, but all's well that ends.

Investroj · Answer 9 · Tue Feb 25 2025 18:30:04 GMT+0800 (China Standard Time)

Is there no solution for this Yet ?

Mahmoud Ashraf · Answer 10 · Tue Feb 25 2025 22:29:30 GMT+0800 (China Standard Time)

This is not an issue that will be solved in this project, you just need to configure all your cuda libraries correctly which can be hard

Mahmoud Ashraf · Answer 11 · Tue Apr 22 2025 20:36:56 GMT+0800 (China Standard Time)

The solution that worked on colab is to uninstall nvidia-cudnn-cu12, this error usually means that you have two cudnn installations on your system