jarfo / gcommands

Speech Commands Recognition using end-to-end deep learning models in pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speech Commands Recognition

Training Deep Learning models using Google Speech Commands Dataset, implemented in PyTorch.

Features

  • Training and testing basic ConvNets and TDNNs.
  • Standard Train, Test, Valid folders for the Google Speech Commands Dataset v0.02.
  • Dataset loader for standard Kaldi speech data folders (files and pipes).

Requirements

To install SoX on Mac with Homebrew:

brew install sox

on Linux:

sudo apt-get install sox

Usage

Google Speech Commands Dataset (v0.02)

To download and extract the Google Speech Commands Dataset run the following command:

./download_audio.sh

Training

Use python3 run.py --help for more parameters and options.

python3 run.py --arc VGG16 --checkpoint VGG16 --num_workers 10

Results (Isolated word recognition, Speech Commands v0.02, 36 words)

Accuracy results for the validation and test sets using the default parameters (VGG16) and with data augmentation (VGG16 + sp)

Model Valid acc. Test acc. parameters and options
VGG16 96.3% 96.4% default
VGG16 + sp 96.6% 96.7% --train_path data/train_training_sp

The augmented training dataset train_training_sp is an speed perturbed version of the train_training dataset. It was obtained using the Kaldi script perturb_data_dir_speed_3way.sh

About

Speech Commands Recognition using end-to-end deep learning models in pytorch

License:Apache License 2.0


Languages

Language:Python 89.9%Language:Shell 10.1%