This is a recipe for learning neural acoustic word embeddings for a subset of Speech Commands v0.02 & DyLNet. The models are explained in greater detail in Settle & Livescu, 2016 as well as Settle et al., 2017:
- S. Settle and K. Livescu, "Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches," in Proc. SLT, 2016.
- S. Settle, K. Levin, H. Kamper, and K. Livescu, "Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings," in Proc. Interspeech, 2017.
code/
- python code to create, run, and save the model
-
Ensure access to installed dependencies.
- Python 3.6
- Tensorflow 1.5 (and numpy/scipy)
- kaldi
- kaldi-io-for-python
-
Clone repo.
-
Download Speech Commands v0.02 corpus
-
Navigate to code directory and run "python main.py -t=0.05 -l=sc2 <corpus_dir>". This will train, evaluate, and save the model named "sc2" on 5% of the corpus.