Getting Started in 'ML-Audio'

Suggestions for students.

About

Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment.

This page began after @drscotthawley felt sufficiently embarassed about not having a coherent answer. Until someone creates a "ML for Audio" online course -- update 1/7/20: See Valerio Velardo's "Deep Learning for Audio"! -- this page may prove helpful.

Notes:

This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)

Introductory Remarks

"Read all the tutorials and papers you can, watch videos of all the talks you can, try out and modify whatever code you can get your hands on, take whatever courses you can find, go to whatever conferences you can. Try to build your own system, and spend all your nights and weekends improving it."

This was the best advice some of us could give, because it was the path we took. Some such stories are shared below. This page is an attempt to offer something more "direct" for newcomers.

Essays / Reflections / Autobiographical Sketches

Many practicioners took very different interdisciplinary paths, learning from a hodgepodge of information, in order to complement their existing strengths and fill in gaps in their knowledge. Here are some stories.

(For submissions: Either link to elsewhere on the web, or add a file to the repo via PR. Try to make submissions conclude with a section on what you would say to new students.)

How __[someone]__ got started
__[a young person]'s__ story
...your name(s) here!...Chris Donahue, Christian Steinmetz, Jordi Pons, Keunwoo Choi, Faro, Justin Salomon,...?

Active Practictioners to Follow

Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.

Audio ML Twitter list by Fabian-Robert Stöter (@faroit). <-- Follow these people!

Quick Quotes

Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"

General Reference Information

Machine Learning Glossary - A reference resource for common ML math topics, definitions, concepts, etc.
Notes on Music Information Retreival

Online Courses

Valerio Velardo's "Deep Learning for Audio"
Andrew Ng's ML Course on Coursera (Good all-around ML course)
Fast.ai (Can get you up and running fast)
Rebecca Fiebrink's Machine Learning for Musicians and Artists on Kadenze (No math!)
Neural Network Programming - Deep Learning with PyTorch. Learn how to code an image predictor neural network in Pytorch. Provides practical NN fundamentals
Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
Foundations of Machine Learning taught by David Rosenberg

Tutorials

(I'm often underwhelmed with audio-specific tutorials, actually. No offense! Feel free to suggest some. Here are a couple on related topics that I've found inspiring)

Andrew Trask's "Anyone Can Learn To Code an LSTM-RNN in Python"
Machine Learning & Deep Learning Fundamentals (Good high level intro to ML concepts and how neural networks operate)

Talks (at conferences)

that we found helpful/inspiring (and are hopefully still relevant)

Paris Smaragdis at SANE 2015: "NMF? Neural Nets? It’s all the same..."
Ron Weiss at SANE 2015: "Training neural network acoustic models on waveforms"
Jordi Pons at DLBCN 2018: "Training neural audio classifiers with few data"
Sander Dieleman at ISMIR 2019: "Generating Music in the Waveform Domain"

Key Papers / Codes

(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )

Keunwoo Choi et al, "Automatic tagging using deep convolutional neural networks" (ISMIR 2016 Best Paper)
SampleRNN
WaveNet
WaveRNN, i.e. "Efficient Neural Audio Synthesis"
GANSynth
Wave-U-Net

Demos

(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)

Chris Donahue's WaveGAN Demo
Scott Hawley's SignalTrain Demo
Neil Zeghidour and David Grangier's Wavesplit
David Samuel, Aditya Ganeshan, and Jason Naradowsky's Meta-TasNet

Packages & Libraries

awesome-python-scientific-audio Curated list of python software and packages related to scientific research in audio
Librosa Great package for various kinds of audio analysis and manipulation
Audiomentations, data augmentation for audio
tf.signal: signal processing for TensorFlow
fastai_audio (and fastai2_audio), audio libraries for Fast.ai library/MOOC. Primarily for image, text & tabular data processing, there are efforts to add audio. (Work in progress.)

Tools / GUIs / Gists

Jesse Engel's gist to plot "rainbowgrams"

Books

Neural Networks and Deep Learning online book. How drscotthawley first started reading.

Computer-Related Topics

Python:

learnpython.org
Python notebooks for fundamentals of music processing

Signal Processing Topics

Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
Yuge Shi's "Gaussian Processes, Not Quite for Dummies"

Statistics / Math Topics

Gradient Descent
Principal Component Analysis: "PCA From Scratch" by @drscotthawley

Datasets (raw audio)

One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:

NSynth Musical Instruments
GTZAN Genre Collection (Note critique by Bob Sturm)
Fraunhofer IDMT Guitar/Bass Effects
Urban Sound Dataset
FreeSound Annotator (formerly FreeSound Datasets)
Birdvox-Full-Night
SignalTrain LA2A
Kaggle Heartbeat Sounds
Search for other audio datasets at Kaggle (list)
A collated list of MIR datasets can be found here, which is the source for audiocontentanalysis.org,but only some are raw audio
Another list of "audio datasets" by Christopher Dossman
...your dataset here...

"Major" ML-Audio Research/Development Groups

Universities:

(or, "Where should I apply for grad school?")

QMUL (London)
UPF (Barcelona)
CRRMA (Stanford, San Francisco)
IRCAM (Paris)
NYU (New York)

Industry:

("Where can I get an internship/job"?)

Google Magenta
Google Perception (speech publications)
Adobe
Spotify
Increasingly, everywhere. ;-)

Conferences

("Which conference(s) should I go to?" -- asked by student on the day this doc began)

Audio-Specific

**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences

Audio Engineering Society (AES)
ASA
Digital Audio Effects (DAFx)
ICASSP
ISMIR
SANE
Web Audio Conference (WAC)
SMC
LVA/ICA
Audio Mostly
WIMP
DCASE
CSMC
MuMe
ICMC
CMMR
IBAC
MLSP
Interspeech
FMA

General ML

ICLR
ICML
NeurIPS
IJCNN

Journals

("Where can I get published?")

IEEE TASLP
JAES
CMJ
JNMR
TISMIR
JASA
EURASIP Journal on Audio Speech and Music Processing

In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.

Competitions / Benchmarks

Some are yearly, some may be defunct but still interesting.

MIREX
SiSEC (Signal Separation Evaluation Campaign)
Kaggle Heartbeat Sounds

Contributors

Ryan Miller

If you want your name listed here, you may. ;-)