modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

在运行speaker diarization中的run_audio.sh时发生错误

Coconut059 opened this issue · comments

你好,在运行程序时有以下报错,好像是安装包版本的问题
run_audio.sh Stage 1: Prepare input wavs...
--2024-03-04 12:49:43-- https://modelscope.cn/api/v1/models/damo/speech_campplus_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2528044 (2.4M) [application/octet-stream]
Saving to: 'examples/2speakers_example.wav'

 0K .......... .......... .......... .......... ..........  2%  418K 6s
50K .......... .......... .......... .......... ..........  4% 1.37M 4s

100K .......... .......... .......... .......... .......... 6% 812K 3s
150K .......... .......... .......... .......... .......... 8% 58.2K 12s
200K .......... .......... .......... .......... .......... 10% 10.2M 10s
250K .......... .......... .......... .......... .......... 12% 18.6M 8s
300K .......... .......... .......... .......... .......... 14% 18.3M 7s
350K .......... .......... .......... .......... .......... 16% 587K 6s
400K .......... .......... .......... .......... .......... 18% 942K 5s
450K .......... .......... .......... .......... .......... 20% 912K 5s
500K .......... .......... .......... .......... .......... 22% 939K 5s
550K .......... .......... .......... .......... .......... 24% 569K 4s
600K .......... .......... .......... .......... .......... 26% 971K 4s
650K .......... .......... .......... .......... .......... 28% 1015K 4s
700K .......... .......... .......... .......... .......... 30% 942K 4s
750K .......... .......... .......... .......... .......... 32% 1.02M 3s
800K .......... .......... .......... .......... .......... 34% 896K 3s
850K .......... .......... .......... .......... .......... 36% 958K 3s
900K .......... .......... .......... .......... .......... 38% 1024K 3s
950K .......... .......... .......... .......... .......... 40% 990K 3s
1000K .......... .......... .......... .......... .......... 42% 1021K 3s
1050K .......... .......... .......... .......... .......... 44% 973K 2s
1100K .......... .......... .......... .......... .......... 46% 468K 2s
1150K .......... .......... .......... .......... .......... 48% 12.7M 2s
1200K .......... .......... .......... .......... .......... 50% 1.03M 2s
1250K .......... .......... .......... .......... .......... 52% 551K 2s
1300K .......... .......... .......... .......... .......... 54% 837K 2s
1350K .......... .......... .......... .......... .......... 56% 991K 2s
1400K .......... .......... .......... .......... .......... 58% 570K 2s
1450K .......... .......... .......... .......... .......... 60% 747K 2s
1500K .......... .......... .......... .......... .......... 62% 949K 1s
1550K .......... .......... .......... .......... .......... 64% 970K 1s
1600K .......... .......... .......... .......... .......... 66% 929K 1s
1650K .......... .......... .......... .......... .......... 68% 711K 1s
1700K .......... .......... .......... .......... .......... 70% 657K 1s
1750K .......... .......... .......... .......... .......... 72% 959K 1s
1800K .......... .......... .......... .......... .......... 74% 972K 1s
1850K .......... .......... .......... .......... .......... 76% 1009K 1s
1900K .......... .......... .......... .......... .......... 78% 930K 1s
1950K .......... .......... .......... .......... .......... 81% 986K 1s
2000K .......... .......... .......... .......... .......... 83% 936K 1s
2050K .......... .......... .......... .......... .......... 85% 1.00M 1s
2100K .......... .......... .......... .......... .......... 87% 769K 0s
2150K .......... .......... .......... .......... .......... 89% 1.29M 0s
2200K .......... .......... .......... .......... .......... 91% 870K 0s
2250K .......... .......... .......... .......... .......... 93% 1007K 0s
2300K .......... .......... .......... .......... .......... 95% 993K 0s
2350K .......... .......... .......... .......... .......... 97% 992K 0s
2400K .......... .......... .......... .......... .......... 99% 954K 0s
2450K .......... ........ 100% 1.26M=3.5s

2024-03-04 12:49:47 (708 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]

--2024-03-04 12:49:47-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 380 [application/octet-stream]
Saving to: 'examples/2speakers_example.rttm'

 0K                                                       100% 1.19M=0s

2024-03-04 12:49:48 (1.19 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]

run_audio.sh Stage2: Do vad for input wavs...
D:\Anaconda3\lib\site-packages\numpy_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda3\lib\site-packages\numpy.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
D:\Anaconda3\lib\site-packages\numpy.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
warnings.warn("loaded more than 1 DLL from .libs:"
2024-03-04 12:49:50,197 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-03-04 12:49:50,199 - modelscope - INFO - Loading ast index from C:\Users\Coconuttt_.cache\modelscope\ast_indexer
2024-03-04 12:49:50,356 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 6d2959ed63b0f2682e848d2d1a7b8118 and a total number of 946 components indexed
2024-03-04 12:49:53,171 - modelscope - INFO - Use user-specified model revision: v2.0.4
2024-03-04 12:49:53,553 - modelscope - WARNING - ('PIPELINES', 'voice-activity-detection', 'funasr-pipeline') not found in ast index file
Traceback (most recent call last):
File "local/voice_activity_detection.py", line 93, in
main()
File "local/voice_activity_detection.py", line 59, in main
vad_pipeline = pipeline(
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 170, in pipeline
return build_pipeline(cfg, task_name=task)
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 65, in build_pipeline
return build_from_cfg(
File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 198, in build_from_cfg
raise KeyError(
KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.'

我尝试了多种版本如:①funasr==0.8.4 modelscope==1.10.0 ② funasr==0.8.7 modelscope==1.10.0 ③funasr==0.8.8 modelscope==1.10.0 还是不能解决问题

funasr 1.x.x has been released. Please update it to latest version.

I tried to update the funasr to latest version,the result is the same.(modelscope==1.11.0)
KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.

Then I update the modelscope to the latest version 1.12.0,a new error occured.

2024-03-04 14:25:07,616 - modelscope - INFO - initiate model from C:\Users\Coconuttt_.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-03-04 14:25:07,616 - modelscope - INFO - initiate model from location C:\Users\Coconuttt_.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-03-04 14:25:07,619 - modelscope - INFO - initialize model from C:\Users\Coconuttt_.cache\modelscope\hub\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 212, in build_from_cfg
return obj_cls(**args)
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\audio\funasr_pipeline.py", line 62, in init
super().init(model=model, **kwargs)
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\base.py", line 100, in init
self.model = self.initiate_single_model(model, **kwargs)
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\base.py", line 53, in initiate_single_model
return Model.from_pretrained(
File "D:\Anaconda3\lib\site-packages\modelscope\models\base\base_model.py", line 183, in from_pretrained
model = build_model(model_cfg, task_name=task_name)
File "D:\Anaconda3\lib\site-packages\modelscope\models\builder.py", line 35, in build_model
model = build_from_cfg(
File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 184, in build_from_cfg
LazyImportModule.import_module(sig)
File "D:\Anaconda3\lib\site-packages\modelscope\utils\import_utils.py", line 475, in import_module
importlib.import_module(module_name)
File "D:\Anaconda3\lib\importlib_init_.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "D:\Anaconda3\lib\site-packages\modelscope\models\audio\funasr\model.py", line 7, in
from funasr import AutoModel
ImportError: cannot import name 'AutoModel' from 'funasr' (unknown location)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "local/voice_activity_detection.py", line 93, in
main()
File "local/voice_activity_detection.py", line 59, in main
vad_pipeline = pipeline(
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 170, in pipeline
return build_pipeline(cfg, task_name=task)
File "D:\Anaconda3\lib\site-packages\modelscope\pipelines\builder.py", line 65, in build_pipeline
return build_from_cfg(
File "D:\Anaconda3\lib\site-packages\modelscope\utils\registry.py", line 215, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
ImportError: FunASRPipeline: cannot import name 'AutoModel' from 'funasr' (unknown location)

You can type "pip install --upgrade funasr". @Coconut059

Still the same error.(upgrade funasr and use modelscope==1.10.0or1.11.0)
KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.'
When I update the modlescope to 1.12.0.The error would turn to:
ImportError: FunASRPipeline: cannot import name 'AutoModel' from 'funasr' (unknown location).
I really don't know what to do..

What is your funasr version。Please ensure it is latest(1.0.11).

I've updated to the latest version but it's still the same error, so I'm running on jupyter instead, and I'm getting another error.
Computing DER...
2024-03-04 20:20:54,429 - INFO: Concatenating individual RTTM files...
Traceback (most recent call last):
File "/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/compute_der.py", line 72, in
main(args)
File "/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/compute_der.py", line 47, in main
[MS, FA, SER, DER_] = DER(
File "/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/DER.py", line 103, in DER
stdout = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File "/opt/conda/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/opt/conda/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/conda/lib/python3.10/subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/conda/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/mnt/workspace/3D-Speaker-main/egs/3dspeaker/speaker-diarization/local/md-eval.pl'

I've updated the "egs/3dspeaker/speaker-diarization/local/DER.py". You can solve it by pull the new one.

Thanks a lot!!!It finally worked. I want to know how to use my own data set to run on this model(speaker diarization).

Thanks a lot!!!It finally worked. I want to know how to use my own data set to run on this model(speaker diarization).

The diarization pipeline is based on a pretrained VAD model and speaker embedding model. The pretrained VAD model used is the "iic\speech_fsmn_vad_zh-cn-16k-common-pytorch" in ModelScope. You can train your own speaker embedding model using any speaker verification recipe in this repo. You can also use other pretrained speaker models in ModelScope by changing the value of "speaker_model_id" in run_audio.sh