modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error occurred during "bash run.sh" for speaker diarization

NathanJHLee opened this issue · comments

Hi My name is Nathan. And i try to test 3d-speaker to get rttm from pretrained model on model scope.
But i get error as below.

(3D-Speaker) [asr@0419bb3cf325 speaker-diarization]$ bash run.sh
Stage 1: Prepare input wavs...
--2024-02-05 09:07:39-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2528044 (2.4M) [application/octet-stream]
Saving to: 'examples/2speakers_example.wav'

100%[===========================================================================>] 2,528,044 831KB/s in 3.0s

2024-02-05 09:07:43 (831 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]

--2024-02-05 09:07:43-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 380 [application/octet-stream]
Saving to: 'examples/2speakers_example.rttm'

100%[===========================================================================>] 380 --.-K/s in 0s

2024-02-05 09:07:44 (40.0 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]

Stage2: Do vad for input wavs...
2024-02-05 09:07:46,885 - modelscope - INFO - PyTorch version 1.13.1 Found.
2024-02-05 09:07:46,886 - modelscope - INFO - Loading ast index from /home/asr/.cache/modelscope/ast_indexer
2024-02-05 09:07:47,056 - modelscope - INFO - Updating the files for the changes of local files, first time updating will take longer time! Please wait till updating done!
2024-02-05 09:07:47,083 - modelscope - INFO - AST-Scanning the path "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets', 'exporters']
2024-02-05 09:08:18,037 - modelscope - INFO - Scanning done! A number of 964 components indexed or updated! Time consumed 30.954344987869263s
2024-02-05 09:08:18,114 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 ccb085697b83dbefd09232fac3402a63 and a total number of 964 components indexed
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please Requires the ffmpeg CLI and ffmpeg-python package to be installed.
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
2024-02-05 09:08:22,477 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4
2024-02-05 09:08:22,825 - modelscope - INFO - initiate model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,826 - modelscope - INFO - initiate model from location /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-02-05 09:08:22,827 - modelscope - INFO - initialize model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,874 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-05 09:08:22,875 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-05 09:08:22,875 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-02-05 09:08:22,875 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
2024-02-05 09:08:22,876 - modelscope - INFO - cuda is not available, using cpu instead.
[INFO]: Start computing VAD...
rtf_avg: 0.043: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.22it/s]
Traceback (most recent call last):
File "local/voice_activity_detection.py", line 90, in
main()
File "local/voice_activity_detection.py", line 71, in main
for vad_t in vad_time['text']:
TypeError: list indices must be integers or slices, not str

if i print "vad_time", I get check
[{'key': 'rand_key_2yW4Acq9GFz6Y', 'value': [[5240, 29010], [29290, 37360], [37640, 67570], [67860, 78980]]}]

I don't understand meaning of text.
Please check this problem.
Thank you.

We revised the requirements for speaker diarization:

numba==0.56.2
umap-learn
funasr==0.8.4
modelscope==1.10.0
hdbscan

And you can try it again. Please feel free to ask me.

Judging from the error message, it should be a problem with the torchaudio version. You can check whether the torchaudio version meets the requirements. We use the virtual environment of python3.8. You can pip install torchaudio==0.12.0. Have a try!

Oh thank you. The problem with Torchaudio was figured out, so I deleted my question yesterday XD.
Anyway I encountered one more import error about transformer package.
I suggest you to add pip install transformers to requirements.txt
So finally it works fine. Thank you for your help :D

We don't need transformer in our dependencies, maybe you can try uninstall transformer. And if you find this repository useful, please consider giving a star.