MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Repository from Github https://github.comMahmoudAshraf97/whisper-diarizationRepository from Github https://github.comMahmoudAshraf97/whisper-diarization

RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on WSL2 Ubuntu24.04

cospotato opened this issue · comments

Hi, i am new to deep learning. It's work on Windows with CUDA 12.5 and cudnn 9.3.0. Then i tried to run on WSL2 with config belowing get the error RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH in WSL2 Ubuntu24.04. What i have missing ?

OS: WSL2 Ubuntu24.04
Kernel: Linux cospotato 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PyTorch Version: 2.5.1
CUDA version: 12.6
cudnn version: 9.3.0

Traceback:

Traceback (most recent call last):
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/diarize.py", line 199, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
    self._init_msdd_model(cfg)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
    instance = class_.restore_from(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
    instance = cls._save_restore_connector.restore_from(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
    loaded_params = self.load_config_and_state_dict(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
    instance = instance.to(map_location)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
    self._init_flat_weights()
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

Additional: If i run NeMo MSDD diarization model section alone. It works. Maybe conflict NeMo conflicted with Whisper ?

commented

@cospotato did you manage to work this out? because I am having exactly the same issue on a RHEL 9 system RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on a bare metal server.

System
OS: Red Hat Enterprise Linux release 9.4 (Plow)
Kernel: 5.14.0-427.35.1.el9_4.x86_64
GPU: Nvidia A30 24GB
CUDA: 12.4.r12.4/compiler.34097967_0
cuDNN: 9.6.0.74
Python: Python 3.12.1 running in venv
torch: 2.5.1

Traceback:

Traceback (most recent call last):
  File "/srv/whisperAI/whisper-diarization/diarize.py", line 202, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
    self._init_msdd_model(cfg)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
    instance = class_.restore_from(
               ^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
    instance = cls._save_restore_connector.restore_from(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
    loaded_params = self.load_config_and_state_dict(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
    instance = instance.to(map_location)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
    self._init_flat_weights()
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

Clearly this is a CUDA issue but I cannot work out what is going on. I assume it is a pyTorch thing

commented

OK quick update... diarize.py -a audio.MP3 is still causing the issue above. HOWEVER, diarize_parallel.py -a audio.MP3 runs and transcribes the autio to text and srt with a good level of activity. BUT the speaker identification does not work. I don't know if that helps or confuses things but thought I would share it.

commented

EDIT: I think this post below is actually a just a set of warnings and is unrelated to diarize.py not running on linux

@cospotato just out of interest did you get a warning directly before this error when calling diarize.py about tarfile.py:2252 not being allowed to use absolute paths anymore?

[NeMo W 2024-12-12 15:17:10 nemo_logging:393] /usr/lib64/python3.12/tarfile.py:2252: RuntimeWarning: The default behavior of tarfile extraction has been changed to disallow common exploits (including CVE-2007-4559). By default, absolute/parent paths are disallowed and some mode bits are cleared. See https://access.redhat.com/articles/7004769 for more details.
      warnings.warn(

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: true

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
    Validation config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    seq_eval_mode: false

same issue , is this resolved ? @DrJPK @cospotato

commented

@sadathknorket not resolved but for some reason that I can't quite explain, the diarize_parralel.py script runs without this error for me. Unfortunately, that parallel script seems to label everything as speaker 0 so it's not working perfectly but it is transcribing and completing. I'm thinking something upstream with NeMo has changed causing this issue.

Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:

pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

pip throws dependency error for pytorch....:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.

... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py

Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:

pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

pip throws dependency error for pytorch....:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.

... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py

I can verify I had the SAME issue, applied the "fix" here, got the SAME error but got the same successful install, and the cuDNN mismatch was resolved. Very weird, but all's well that ends.

Is there no solution for this Yet ?

This is not an issue that will be solved in this project, you just need to configure all your cuda libraries correctly which can be hard

The solution that worked on colab is to uninstall nvidia-cudnn-cu12, this error usually means that you have two cudnn installations on your system