baidu-research / ba-dls-deepspeech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can this model be used for Speaker Recognition ?

nishanksinglasjsu opened this issue · comments

Hi,
I am working on Speaker Recognition. Is it possible to use this model for Speaker Recognition ?
If yes can you please guide me a little. And If not can you refer me some Deep Learning models which I can use for it.

commented

sure it can,but you must enlarge your training set so you can get more accurate results.

Thanks HulkSun for the reply.
I am happy to know that this model can be used for speaker recognition. Though I am not sure how to use it.
Can you please explain me a little about How can I use this model for speaker recognition. What would be data set ?

commented

hi,nishanksinglasjsu
you can read the paper that explained how the model works and how to train it.

Hi HulkSun,
Thank you for the paper. I will definitely go through this.
I am a beginner in Deep Learning especially in speech recognition model. I know CNN very well but not RNN.
Major problem I am facing is in understanding the dataset. I understand that the input(X) is spectrogram of an audio wav file but what is output(y) data in speech recognition.
According to my readings of research papers, for text-dependent speaker recognition I can use a CNN model in which the input(X) will be the spectrogram image of an audio file and output(y) can be a vector of 1's and 0's with index of 1 represents a unique speaker or user just like MNIST data set.

Can you please tell me if this implementation for speaker recognition is right ?