There are 1 repository under mel-spectrogram topic.
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)
This repository contains PyTorch implementation of 4 different models for classification of emotions of the speech.
CNN 1D vs 2D audio classification
A librosa STFT/Fbank/mfcc feature extration written up in PyTorch using 1D Convolutions.
Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm.
Urban sound source tagging from an aggregation of four second noisy audio clips via 1D and 2D CNN (Xception)
Zafar's Audio Functions in Python for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.
Zafar's Audio Functions in Matlab for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.
Attention-based Hybrid CNN-LSTM and Spectral Data Augmentation for COVID-19 Diagnosis from Cough Sound
基于梅尔频谱的信号分类和识别
Framework for one-shot multispeaker system based on Deep Learning
Code for "Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features" arXiv:2110.08862, 2021.
Basic wavenet and fftnet vocoder model.
Open Source Implementation of Neural Voice Cloning with Few Audio Samples (Baidu Research)
Cough detection with Log Mel Spectrogram, Wavelet Transform, Deep learning and Transfer learning techniques
This study converts piano recordings to mel spectrogram and classifies them by SOTA pre-trained neural network backbones in CV. Comparative experiments show that SqueezeNet achieves a best classification accuracy of 92.37%.|该项目将钢琴录音转为为mel频谱图,使用微调后的前沿计算机视觉领域预训练深度学习骨干网络对其进行分类,对比实验可知SqueezeNet作为最优网络正确率可达92.37%
Master's Thesis: Automatic Tagging of Musical Compositions Using Machine Learning Methods
Speech Recognition and Voice Activity Detection using a Convolutional Neural Network Architecture built with Tensorflow.js
This repository contains the Python code for a audio classification system designed to detect gunshots in urban settings.
Java Implementation of the Sonopy Audio Feature Extraction Library by MycroftAI
Least-squares (sparse) spectral estimation and (sparse) LPV spectral decomposition.
Speech Emotion Recognition using Deep Learning
Deep Multi-Speech model
Compute the MFCCs and measure (dis)similarity between two audio files using DTW
This Model analyzes and predicts the input sound and then using pretrained ANC systems cancels the input sound.
Speech Emotion Recognition (SER) in Tensorflow using CNNs and CRNNs Based on Mel Spectrograms and Mel Frequency Cepstral Coefficients (MFCCs)
Music genre classification using deep learning
Zafar's Audio Functions in Julia for audio signal analysis: STFT, inverse STFT, CQT kernel, CQT spectrogram, CQT chromagram, MFCC, DCT, DST, MDCT, inverse MDCT.
In this project we use a Lightweight-CNN based model to classify instruments from the Freesound audio data set. We make use of Mel-Spectrogram features from the input audio data as the input to the CNN model. To add robustness to the model, we use a novel data augmentation technique based on the Cut-Mix algorithm.
Convert audio file to melgram (that is, mel-spectrogram) in .NET
Analyzing Vibrational Data of the System using Machine Learning
This repository contains different CNN methods for audio classification. It starts with canceling noise from audio. Then it converts the audio into a mel-spectrogram and trains with CNN models.