dynamic-time-warping keyword-extraction keyword-spotting

Introduction

Keyword Spotting (KWS) refers to the task of detecting a pre-defind keyword/phrase in an audio file or a stream of audio. The implemented algorithm uses a sliding Dynamic Time Warping (DTW) approach. Refer to this paper for a detailed explanation. You can also view my presentation here.

Datasets

TIMIT is used for training a Neural Network which acts as a feature extractor.
The Google Speech Commands dataset is used for testing the performance of the algorithm.

Instructions

The following python packages are required: numpy, matplotlib, pickle, torch, json, scipy, python_speech features, yaml
For relative paths to work smoothly, please adhere to the following directory structure:

KWS (parent directory)
├── speech (Google Speech Commands)
│	├── bed (example class)
│	├── ...
├── nn
│	├── TIMIT
│	├── TEST
│	├── TRAIN
│	├── models (where trained models are stored)
│	│	├── best.pth (a shallow pre-trained model with ±4 context is included)
│	│	├── (other models)
│	├── (python scripts and config file)

'dl_model.py' is responsible for training the Neural Network feature extractor while 'sliding_kws.py' runs the actual experiments and dumps a json file containing the results.

About

Keyword Spotting for detecting a word in an audio file

dynamic-time-warping keyword-extraction keyword-spotting

Languages

Language:Python 100.0%