Speech Feature Extraction

The repository describes the feature extraction methods for speech signals.

Free speech datasets

OpenLSR: OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition.
VoxForge: VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.
TIMIT: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Mozilla Speech: Mozilla Releases the world's Second Largest Public Voice Data Set on Nov 29th, 2017.
Open Data for Deep Learning

feature_extraction_functions.py: a set of feature extraction functions from RDShi-SpeakerCount.
MFCC: Mel-frequency cepstral coefficients calculation.
- MFCC.py, MFCCTest.py: Compute the MFCC feature.
- FeatureExtraction.ipynb: Speech preprocessing, including loading data, pre-emphasis, framing, window, Fourier-transform, power spectrum, filter banks, mfccs and mean normalization.
Volume: volume calculation.
ZeroCR: Zero-Crossing Rate calculation.
Pitch: Pitch calculation and pitch tracking.
Timbre: spectrogram drawing.
VAD: EPD (End-Point Detection), or Speech Detection, or VAD(Voice Activity Detection).

Anaconda3 (Python3.x)

Feature extraction of speech signal is the initial stage of any speech recognition system.

Language:Python 51.3%Language:Jupyter Notebook 48.7%