Alison Bernice Ma's repositories
ai-audio-startups
Community list of startups working with AI in audio and music technology
astronify
Astronomical data sonification.
audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
auraloss
Collection of audio-focused loss functions in PyTorch
CLAP
Contrastive Language-Audio Pretraining
CLAP-microsoft
Learning audio concepts from natural language supervision
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
ddsp
DDSP: Differentiable Digital Signal Processing
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1/24 kHz mono/stereo audio.
jukebox
Code for the paper "Jukebox: A Generative Model for Music"
micro-tcn
Efficient neural networks for analog audio effect modeling
MIR_ismir2018-oss-tutorial
ISMIR2018 Tutorial on Open Source and Reproducibility in MIR Research
MIR_openl3
OpenL3: Open-source deep audio and image embeddings
ML_embedding-playbook
You want to embed your Tableau content in lots of places. Start here.
encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
pedalboard
π π A Python library for working with audio.
PUBLICATIONS_paperTemplates
Repository for paper templates in ISMIR Proceedings
Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Resemblyzer
A python package to analyze and compare voices with deep learning
riffusion
Stable diffusion for real-time music generation
riffusion-app
Stable diffusion for real-time music generation (web app)
sample-generator
Tools to train a generative model on arbitrary audio samples
SoundStream
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
tiny-audio-diffusion
A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)
visqol
Perceptual Quality Estimator for speech and audio
VoiceLab
Automated Reproducible Acoustical Analysis
WaveRNN
WaveRNN Vocoder + TTS