Regarding speaker recognition in PASE

Question

Regarding speaker recognition in PASE

Some-random opened this issue 4 years ago · comments

I'm a bit confused about the experimental setup for speaker recognition described in the original PASE paper. If my understanding is correct, only 15s * 2484, or 10.35 hours of speech from librispeech is used for pretraining, and only 11s * 109, or 0.33 hours of speech from VCTK is used for fine-tuning. Both numbers seem awfully small...