Getting Started in 'ML-Audio'
Suggestions for students.
About
Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment.
This page began after @drscotthawley felt sufficiently embarassed about not having a coherent answer. Until someone creates a "ML for Audio" online course -- update 1/7/20: See Valerio Velardo's "Deep Learning for Audio"! -- this page may prove helpful.
Notes:
- This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)
Introductory Remarks
"Read all the tutorials and papers you can, watch videos of all the talks you can, try out and modify whatever code you can get your hands on, take whatever courses you can find, go to whatever conferences you can. Try to build your own system, and spend all your nights and weekends improving it."
This was the best advice some of us could give, because it was the path we took. Some such stories are shared below. This page is an attempt to offer something more "direct" for newcomers.
Essays / Reflections / Autobiographical Sketches
Many practicioners took very different interdisciplinary paths, learning from a hodgepodge of information, in order to complement their existing strengths and fill in gaps in their knowledge. Here are some stories.
(For submissions: Either link to elsewhere on the web, or add a file to the repo via PR. Try to make submissions conclude with a section on what you would say to new students.)
- How
__[someone]__
got started __[a young person]'s__
story- ...your name(s) here!...Chris Donahue, Christian Steinmetz, Jordi Pons, Keunwoo Choi, Faro, Justin Salomon,...?
Active Practictioners to Follow
Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.
- Audio ML Twitter list by Fabian-Robert Stöter (@faroit). <-- Follow these people!
Quick Quotes
- Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"
General Reference Information
- Machine Learning Glossary - A reference resource for common ML math topics, definitions, concepts, etc.
- Notes on Music Information Retreival
Online Courses
- Valerio Velardo's "Deep Learning for Audio"
- Andrew Ng's ML Course on Coursera (Good all-around ML course)
- Fast.ai (Can get you up and running fast)
- Rebecca Fiebrink's Machine Learning for Musicians and Artists on Kadenze (No math!)
- Neural Network Programming - Deep Learning with PyTorch. Learn how to code an image predictor neural network in Pytorch. Provides practical NN fundamentals
- Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
- Foundations of Machine Learning taught by David Rosenberg
Tutorials
(I'm often underwhelmed with audio-specific tutorials, actually. No offense! Feel free to suggest some. Here are a couple on related topics that I've found inspiring)
- Andrew Trask's "Anyone Can Learn To Code an LSTM-RNN in Python"
- Machine Learning & Deep Learning Fundamentals (Good high level intro to ML concepts and how neural networks operate)
Talks (at conferences)
that we found helpful/inspiring (and are hopefully still relevant)
- Paris Smaragdis at SANE 2015: "NMF? Neural Nets? It’s all the same..."
- Ron Weiss at SANE 2015: "Training neural network acoustic models on waveforms"
- Jordi Pons at DLBCN 2018: "Training neural audio classifiers with few data"
- Sander Dieleman at ISMIR 2019: "Generating Music in the Waveform Domain"
Key Papers / Codes
(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )
- Keunwoo Choi et al, "Automatic tagging using deep convolutional neural networks" (ISMIR 2016 Best Paper)
- SampleRNN
- WaveNet
- WaveRNN, i.e. "Efficient Neural Audio Synthesis"
- GANSynth
- Wave-U-Net
Demos
(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)
- Chris Donahue's WaveGAN Demo
- Scott Hawley's SignalTrain Demo
- Neil Zeghidour and David Grangier's Wavesplit
- David Samuel, Aditya Ganeshan, and Jason Naradowsky's Meta-TasNet
Packages & Libraries
- awesome-python-scientific-audio Curated list of python software and packages related to scientific research in audio
- Librosa Great package for various kinds of audio analysis and manipulation
- Audiomentations, data augmentation for audio
- tf.signal: signal processing for TensorFlow
- fastai_audio (and fastai2_audio), audio libraries for Fast.ai library/MOOC. Primarily for image, text & tabular data processing, there are efforts to add audio. (Work in progress.)
Tools / GUIs / Gists
- Jesse Engel's gist to plot "rainbowgrams"
Books
- Neural Networks and Deep Learning online book. How drscotthawley first started reading.
Computer-Related Topics
Python:
- learnpython.org
- Python notebooks for fundamentals of music processing
Signal Processing Topics
- Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
- Yuge Shi's "Gaussian Processes, Not Quite for Dummies"
Statistics / Math Topics
- Gradient Descent
- Principal Component Analysis: "PCA From Scratch" by @drscotthawley
Datasets (raw audio)
One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:
- NSynth Musical Instruments
- GTZAN Genre Collection (Note critique by Bob Sturm)
- Fraunhofer IDMT Guitar/Bass Effects
- Urban Sound Dataset
- FreeSound Annotator (formerly FreeSound Datasets)
- Birdvox-Full-Night
- SignalTrain LA2A
- Kaggle Heartbeat Sounds
- Search for other audio datasets at Kaggle (list)
- A collated list of MIR datasets can be found here, which is the source for audiocontentanalysis.org,but only some are raw audio
- Another list of "audio datasets" by Christopher Dossman
- ...your dataset here...
"Major" ML-Audio Research/Development Groups
Universities:
(or, "Where should I apply for grad school?")
- QMUL (London)
- UPF (Barcelona)
- CRRMA (Stanford, San Francisco)
- IRCAM (Paris)
- NYU (New York)
Industry:
("Where can I get an internship/job"?)
- Google Magenta
- Google Perception (speech publications)
- Adobe
- Spotify
- Increasingly, everywhere. ;-)
Conferences
("Which conference(s) should I go to?" -- asked by student on the day this doc began)
Audio-Specific
**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences
- Audio Engineering Society (AES)
- ASA
- Digital Audio Effects (DAFx)
- ICASSP
- ISMIR
- SANE
- Web Audio Conference (WAC)
- SMC
- LVA/ICA
- Audio Mostly
- WIMP
- DCASE
- CSMC
- MuMe
- ICMC
- CMMR
- IBAC
- MLSP
- Interspeech
- FMA
General ML
- ICLR
- ICML
- NeurIPS
- IJCNN
Journals
("Where can I get published?")
In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.
Competitions / Benchmarks
Some are yearly, some may be defunct but still interesting.
- MIREX
- SiSEC (Signal Separation Evaluation Campaign)
- Kaggle Heartbeat Sounds
Contributors
If you want your name listed here, you may. ;-)