AryaAftab / LIGHT-SERNET

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MFCC hop size problem.

yihliang831209 opened this issue · comments

"Good job on the paper. However, there seems to be a discrepancy regarding the frame overlaps and hop size between your text and the provided code. In your paper, it's stated that a Hamming window is used to split the audio signal into 64-ms frames with 16-ms overlaps, which are considered as quasi-stationary segments. From this, it would logically follow that the hop size is 48 ms.

However, in the hyperparameters.py file, it's stated "FRAME_STEP = 256". Given a sampling rate (fs) of 16 kHz, this implies a hop size of 16 ms, not 48 ms. Could you please clarify if there's a typographical error in the paper, or if there's a specific reason for this inconsistency?"