MinSang Baek's repositories
Audio-visual-sound-localization
Audio-visual sound localization
AudioFile
A simple C++ library for reading and writing audio files.
awesome-mac
Now we have become very big, Different from the original idea. Collect premium software in various categories.
clarity
Clarity Challenge toolkit - software for building Clarity Challenge systems
CMGAN
Conformer-based Metric GAN for speech enhancement
cocktail-fork-separation
Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset
ControlNet
Let us control diffusion models!
DeepWaveTorch
DeepWave: A Recurrent Neural-Network for Real-Time Acoustic Imaging (PyTorch implementation)
DisVoice
feature extraction from speech signals
DnnNormTimeFreq4DoA
A DNN based Normalized Time-frequency Weighted Criterion for Robust Wideband DoA Estimation
ffc_se
Code for the paper "FFC-SE: Fast Fourier Convolution for Speech Enhancement" (published at Interspeech 2022 conference)
fundsp
Audio DSP library for audio processing and synthesis
gss
A simple package for Guided source separation (GSS)
ivy
The Unified Machine Learning Framework
Learning_Neural_Acoustic_Fields
Official code for "Learning Neural Acoustic Fields"
Multi-clue-TSE-data
Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"
nara_wpe
Different implementations of "Weighted Prediction Error" for speech dereverberation
NKF-AEC
Acoustic Echo Cancellation with Nerual Kalman Filtering
opensmile
The Munich Open-Source Large-Scale Multimedia Feature Extractor
padertorch
A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
shap
A game theoretic approach to explain the output of any machine learning model.
sigsep-mus-eval
museval - source separation evaluation tools for python
sms_wsj
SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
sound-spaces
A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
visqol
Perceptual Quality Estimator for speech and audio