jefflai108 / Contrastive-Predictive-Coding-PyTorch

Contrastive Predictive Coding for Automatic Speaker Verification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How combine MFCC and CPCfeatures

cyl250 opened this issue · comments

Thank you for sharing your code, I have meet some problem.
When we use CPC, it is [128,256] but mfcc is [frame,39],
as you result, I wonder how to combine it in [frame, 39 + 256] dims.
Thanks again

hi @cyl250
It is common to combine features by simply concatenate them (along the feature dimension).

The CPC feature is [num_framess, 256] and MFCC is [num_frames, 39]. Concatenating them would give [num_frames, 256+39].

sorry, I have some trouble to understant
After model.predict() the CPC features is in [128,256] dims.
Do I need change the numbers of node of network to fix mode.prdict() return a [num_frames,256] vectors?

128 is the number of frames during TRAINING. In the CPC training, random chunks from the raw waveform are selected and input to the encoder. For example, a random chunk of 20480 data points corresponds to 1.28 seconds, or 128 frames (16k Hz audio).

During inference, you should input the entire utterance instead of the chunks. This will give you the correct number of frames instead of 128.