There are 2 repositories under mfcc-features topic.
Voice Activity Detection based on Deep Learning & TensorFlow
Audio feature extraction and classification
Repository for CIKM 2020 resource track paper: MAEC: A Multimodal Aligned Earnings Conference Call Dataset for Financial Risk Prediction
Using a raspberry pi, we listen to the coffee machine and count the number of coffee consumption
Tiny Machine Learning Snoring Detection Model for Embedded devices
A RESTFUL API implementation of an authentification system using voice fingerprint
Multi-class audio classification with MFCC features using CNN
stm32-speech-recognition-and-traduction is a project developed for the Advances in Operating Systems exam at the University of Milan (academic year 2020-2021). It implements a speech recognition and speech-to-text translation system using a pre-trained machine learning model running on the stm32f407vg microcontroller.
MFCC features + SVM for speech emotion classification
A Python implementation of STFT and MFCC audio features from scratch
A corpus that can be used to train English-to-Italian End-to-End Speech-to-Text Machine Translation models
Java Implementation of the Sonopy Audio Feature Extraction Library by MycroftAI
Voice Activity Detector based on MFCC features and DNN model
An automatic speaker recognition system built from digital signal processing tools, Vector Quantization and LBG algorithm
Deep learning-based audio spoofing attack detection experiments for speaker verification.
Audio classification using a simple SVM classifier making use of MFCC and Spectrogram features coded from scratch
A repos for USTH Digital Signal Processing 2020 Group 3 project. It's quite obvious in the title.
In this challenge, the goal is to learn to recognize which of several English words is pronounced in an audio recording. This is a multiclass classification task.
Signal Processing Course project
Audio command recognition by DTW and classification
Classify and recognize emotions through voice signal in a foreign language
Another project for classifying Covid and non-covid patients through cough sound. Using CRNN-Attention model with the sound data converted into image data
Classification of urban sounds such as air conditioner, jackhammer, drilling, siren, street music, engine idling and children playing by using Mel-frequency Cepstral Coefficients (MFCCs) as audio feature and CNN algorithm.
Classify music in two categories progressive rock and non-progressive rock using mfcc features, MLP, and CNN.
This project was my final Bachelor's degree thesis. In it I decided to mix my passion, music, and the syllabus that I liked the most in my degree, deep learning.
Implementation of Mel-Frequency Cepstral Coefficients (MFCC) extraction
RespireNet is an innovative web-based application that harnesses the capabilities of deep learning and Mel-frequency cepstral coefficients (MFCC) as a feature extraction technique for accurate respiratory disease prediction. The primary objective of this user-friendly web application is to facilitate early detection.
Implementation of Persian Isolated-Digits Recognition with Matlab
Bali has a diversity of arts that has been recognized by the world, where one of the most famous Balinese arts is the Karawitan art, especially the Kendang Tunggal instrument. Notation documentation or more commonly known as music transcription, can make learning a song easier, and in the case of this research, it makes it easier to learn to play the Kendang Tunggal instrument. The first approach method used to document a kendang tunggal song is onset detection. Onset is when the signal experiences an attack period, which helps segment the sound color of the drum instrument. The segmented kendang tunggal sound color classification uses the Backpropagation algorithm with several features of the frequency domain and time domain as a characteristic of the sound color. Then the kendang tunggal song is revived into a synthetic sound with the Mel Spectral Approximation filter. Based on the research, the optimal parameter for drum sound color segmentation with onset detection is the hop size 110 with normalization of the features on its onset detection function. The optimal backpropagation architecture obtained with a learning rate of 0.9, neurons 10, and epoch 2000 produces an accuracy of 60.85%. The synthesis method using the Mel Log Spectrum Approximation can make synthetic sounds similar to kendang songs with an accuracy of 83.33%
⚙ 감정 인식 모델 개발 ⚙
Development of a Voice Activity Detector and a Speaker Recognition System. Feature extraction in time and frequency domain. Classification in ten individual speakers.
recognizing spoken Bangla numbers using MFCCs and CNN.
👉 This repository contains basic audio 🔊 processing code with feature extraction explained. 🎶 🎶 🎶