Can this model be used for Speaker Recognition ?

Question

Can this model be used for Speaker Recognition ?

nishanksinglasjsu opened this issue 8 years ago · comments

Hi,
I am working on Speaker Recognition. Is it possible to use this model for Speaker Recognition ?
If yes can you please guide me a little. And If not can you refer me some Deep Learning models which I can use for it.

HULK · Answer 1 · Thu Nov 10 2016 16:14:38 GMT+0800 (China Standard Time)

sure it can,but you must enlarge your training set so you can get more accurate results.

nishanksinglasjsu · Answer 2 · Fri Nov 11 2016 06:05:28 GMT+0800 (China Standard Time)

Thanks HulkSun for the reply.
I am happy to know that this model can be used for speaker recognition. Though I am not sure how to use it.
Can you please explain me a little about How can I use this model for speaker recognition. What would be data set ?

HULK · Answer 3 · Fri Nov 11 2016 09:27:34 GMT+0800 (China Standard Time)

hi,nishanksinglasjsu
you can read the paper that explained how the model works and how to train it.

nishanksinglasjsu · Answer 4 · Fri Nov 11 2016 10:56:28 GMT+0800 (China Standard Time)

Hi HulkSun,
Thank you for the paper. I will definitely go through this.
I am a beginner in Deep Learning especially in speech recognition model. I know CNN very well but not RNN.
Major problem I am facing is in understanding the dataset. I understand that the input(X) is spectrogram of an audio wav file but what is output(y) data in speech recognition.
According to my readings of research papers, for text-dependent speaker recognition I can use a CNN model in which the input(X) will be the spectrogram image of an audio file and output(y) can be a vector of 1's and 0's with index of 1 represents a unique speaker or user just like MNIST data set.

Can you please tell me if this implementation for speaker recognition is right ?