Inference acceleration

Question

Inference acceleration

yangyyt opened this issue 5 months ago · comments

When applying the module of speaker classification, hundreds of millions of data inference, how to perform batch inference when vad, extraction embedding.

Thanks to the author for his reply and suggestions

Hui Wang · Answer 1 · Fri Mar 01 2024 17:29:34 GMT+0800 (China Standard Time)

Currently, batch processing is not supported, only multi-process and multi-GPU processing are supported.

Wall.E · Answer 2 · Fri Mar 01 2024 21:27:44 GMT+0800 (China Standard Time)

Currently, batch processing is not supported, only multi-process and multi-GPU processing are supported.

Thanks, The model of Chinese-English mixed speaker diarization, does ModelScope support it?,I don't seem to see this model on the web side.
https://github.com/alibaba-damo-academy/3D-Speaker/tree/main/egs/3dspeaker/speaker-diarization

or
Will there be plans for open-source batch inference in the future?

Wall.E · Answer 3 · Sat Mar 02 2024 14:11:18 GMT+0800 (China Standard Time)

I tried the example under ModelScope 【https://www.modelscope.cn/models/iic/speech_campplus_speaker-diarization_common/summary】, and found that the same audio, the same speaker extracted the embedding model, and the 3D-Speaker could successfully judge multiple people, but ModelScope could not, and I suspected that ModelScope Pipeline and 3DSpeaker may have some parameters or logic inconsistencies. But it seems that ModelScope doesn't support batch inference either.

Hui Wang · Answer 4 · Mon Mar 04 2024 10:09:57 GMT+0800 (China Standard Time)

Currently, batch processing is not supported, only multi-process and multi-GPU processing are supported.

Thanks, The model of Chinese-English mixed speaker diarization, does ModelScope support it?,I don't seem to see this model on the web side. https://github.com/alibaba-damo-academy/3D-Speaker/tree/main/egs/3dspeaker/speaker-diarization

or Will there be plans for open-source batch inference in the future?

You can change the value of variable “speaker_model_id” to iic/speech_campplus_sv_zh_en_16k-common_advanced to use Chinese-English mixed model.
We will support batch inference. @yangyyt

Hui Wang · Answer 5 · Mon Mar 04 2024 10:22:50 GMT+0800 (China Standard Time)

I tried the example under ModelScope 【https://www.modelscope.cn/models/iic/speech_campplus_speaker-diarization_common/summary】, and found that the same audio, the same speaker extracted the embedding model, and the 3D-Speaker could successfully judge multiple people, but ModelScope could not, and I suspected that ModelScope Pipeline and 3DSpeaker may have some parameters or logic inconsistencies. But it seems that ModelScope doesn't support batch inference either.

The inference processes of the two are almost consistent, except for minor differences. If there are significantly different outputs, the input may be short audio. Since the current pipeline is not robust for recognizing short audio, it is recommended to use longer audio(>1min). @yangyyt