RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on WSL2 Ubuntu24.04
cospotato opened this issue · comments
Hi, i am new to deep learning. It's work on Windows with CUDA 12.5
and cudnn 9.3.0
. Then i tried to run on WSL2 with config belowing get the error RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH in WSL2 Ubuntu24.04
. What i have missing ?
OS: WSL2 Ubuntu24.04
Kernel: Linux cospotato 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PyTorch Version: 2.5.1
CUDA version: 12.6
cudnn version: 9.3.0
Traceback:
Traceback (most recent call last):
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/diarize.py", line 199, in <module>
msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
self._init_msdd_model(cfg)
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
instance = class_.restore_from(
^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
instance = cls._save_restore_connector.restore_from(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
loaded_params = self.load_config_and_state_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
instance = instance.to(map_location)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
self._init_flat_weights()
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
self.flatten_parameters()
File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH
Additional: If i run NeMo MSDD diarization model section alone. It works. Maybe conflict NeMo conflicted with Whisper ?
@cospotato did you manage to work this out? because I am having exactly the same issue on a RHEL 9 system RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH
on a bare metal server.
System
OS: Red Hat Enterprise Linux release 9.4 (Plow)
Kernel: 5.14.0-427.35.1.el9_4.x86_64
GPU: Nvidia A30 24GB
CUDA: 12.4.r12.4/compiler.34097967_0
cuDNN: 9.6.0.74
Python: Python 3.12.1
running in venv
torch: 2.5.1
Traceback:
Traceback (most recent call last):
File "/srv/whisperAI/whisper-diarization/diarize.py", line 202, in <module>
msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
self._init_msdd_model(cfg)
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
instance = class_.restore_from(
^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
instance = cls._save_restore_connector.restore_from(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
loaded_params = self.load_config_and_state_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
instance = instance.to(map_location)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
self._init_flat_weights()
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
self.flatten_parameters()
File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH
Clearly this is a CUDA issue but I cannot work out what is going on. I assume it is a pyTorch thing
OK quick update... diarize.py -a audio.MP3
is still causing the issue above. HOWEVER, diarize_parallel.py -a audio.MP3
runs and transcribes the autio to text and srt with a good level of activity. BUT the speaker identification does not work. I don't know if that helps or confuses things but thought I would share it.
EDIT: I think this post below is actually a just a set of warnings and is unrelated to diarize.py
not running on linux
@cospotato just out of interest did you get a warning directly before this error when calling diarize.py
about tarfile.py:2252
not being allowed to use absolute paths anymore?
[NeMo W 2024-12-12 15:17:10 nemo_logging:393] /usr/lib64/python3.12/tarfile.py:2252: RuntimeWarning: The default behavior of tarfile extraction has been changed to disallow common exploits (including CVE-2007-4559). By default, absolute/parent paths are disallowed and some mode bits are cleared. See https://access.redhat.com/articles/7004769 for more details.
warnings.warn(
[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: true
[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
[NeMo W 2024-12-12 15:17:11 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
seq_eval_mode: false
same issue , is this resolved ? @DrJPK @cospotato
@sadathknorket not resolved but for some reason that I can't quite explain, the diarize_parralel.py
script runs without this error for me. Unfortunately, that parallel script seems to label everything as speaker 0 so it's not working perfectly but it is transcribing and completing. I'm thinking something upstream with NeMo has changed causing this issue.
Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:
pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
pip throws dependency error for pytorch....:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.
... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py
Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:
pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
pip throws dependency error for pytorch....:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.
... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py
I can verify I had the SAME issue, applied the "fix" here, got the SAME error but got the same successful install, and the cuDNN mismatch was resolved. Very weird, but all's well that ends.
Is there no solution for this Yet ?
This is not an issue that will be solved in this project, you just need to configure all your cuda libraries correctly which can be hard
The solution that worked on colab is to uninstall nvidia-cudnn-cu12
, this error usually means that you have two cudnn installations on your system