methi1999 / KWS

Keyword Spotting for detecting a word in an audio file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Keyword Spotting (KWS) refers to the task of detecting a pre-defind keyword/phrase in an audio file or a stream of audio. The implemented algorithm uses a sliding Dynamic Time Warping (DTW) approach. Refer to this paper for a detailed explanation. You can also view my presentation here.

Datasets

  1. TIMIT is used for training a Neural Network which acts as a feature extractor.
  2. The Google Speech Commands dataset is used for testing the performance of the algorithm.

Instructions

  1. The following python packages are required: numpy, matplotlib, pickle, torch, json, scipy, python_speech features, yaml
  2. For relative paths to work smoothly, please adhere to the following directory structure:
KWS (parent directory)
├── speech (Google Speech Commands)
│	├── bed (example class)
│	├── ...
├── nn
│	├── TIMIT
│	├── TEST
│	├── TRAIN
│	├── models (where trained models are stored)
│	│	├── best.pth (a shallow pre-trained model with ±4 context is included)
│	│	├── (other models)
│	├── (python scripts and config file)
  1. 'dl_model.py' is responsible for training the Neural Network feature extractor while 'sliding_kws.py' runs the actual experiments and dumps a json file containing the results.

About

Keyword Spotting for detecting a word in an audio file


Languages

Language:Python 100.0%