modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing transcripts?

chenht2021 opened this issue · comments

I read the FAQ on page. But I still find missing some transcripts, for example, the speaker 3D_SPK_00001 does not exist in transcription/train_transcription or transcription/test_transcription.
I missed something?
Or it just provides some transcripts.

Currently, our text annotations are only available for audio clips recorded with DIRECTIONAL devices. The reason for this is that we focus on annotating clear and distinct audio rather than using audio data that is not as clear, such as those from far-field recordings or in dialects. Our dataset is more focused on speaker-related tasks. If further text annotation releases, we will update the information on our website.

Thanks for your explanation.
Ok, maybe off topic, if not appropriate, pls close it.
I read LAURAGPT, It says the the trainning data of TTS is LibriTTS and 3D-Speaker, and copied it 2 times, so the number of samples is 5.0M.
LibriTTS train set is about 206K, and all 3D-Speaker's train set is about 643k, if count annotations, it will be less.
So the number of samples for trainning TTS is wrong? should be 500k?

In the experiment with LauraGPT, data from the highest quality device of 3D-Speaker Datasets was utilized, and certain data augmentation was performed. For specific data details, please refer to the original paper.

After double-checking with the authors, it appears that the LibriTTS data you provided seems to be smaller than expected. Additionally, we have also utilized data from aishell-1,2,3 in the TTS tasks, which was inadvertently omitted in the current preprint version of our paper. We will rectify this detail in our subsequent revisions.