Ilyushin / lidbox

End-to-end spoken language identification out of the box. Rewrite in progress for first release (version 1).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lidbox

  • Spoken language identification (LId) out of the box using TensorFlow.
  • Models implemented with tf.keras.
  • Metadata handling with pandas DataFrames.
  • High-performance, parallel preprocessing pipelines with tf.data
  • Simple spectral and cepstral feature extraction on the GPU with tf.signal.
  • Average detection cost (C_avg) implemented as a tf.keras.metrics.Metric subclass.
  • Angular proximity loss implemented as a tf.keras.losses.Loss subclass.

Why would I want to use this?

  • You need a simple, deep learning based speech classification pipeline. For example: waveform -> VAD filter -> augment audio data -> serialize all data to a single binary file -> extract log-scale Mel-spectra or MFCC -> use DNN/CNN/LSTM/GRU/attention (etc.) to classify by signal labels
  • You want to train a language vector/embedding extractor model (e.g. x-vector) on large amounts of data.
  • You have a TensorFlow/Keras model that you train on the GPU and want the tf.data.Dataset extraction pipeline to also be on the GPU
  • You want an end-to-end pipeline that uses TensorFlow 2 as much as possible

Why would I not want to use this?

  • You are happy doing everything with Kaldi or some other toolkits
  • You don't want to debug by reading the source code when something goes wrong
  • You don't want to install TensorFlow 2 and configure its dependencies (CUDA etc.)
  • You want to train phoneme recognizers or use CTC

Examples

Installing

Python 3.7 or 3.8 is required.

From source

python3 -m pip install https://github.com/py-lidbox/lidbox/archive/master.zip

Most recent version from PyPI

python3 -m pip install 'lidbox==1.0.0rc0'

TensorFlow

TensorFlow 2 is not included in the package requirements because you might want to do custom configuration to get the GPU working etc.

If you don't want to customize anything and instead prefer something that just works for now, the following should be enough:

python3 -m pip install tensorflow

Editable install

If you plan on making changes to the code, it is easier to install lidbox as a Python package in setuptools develop mode:

git clone --depth 1 https://github.com/py-lidbox/lidbox.git
python3 -m pip install --editable ./lidbox

Then, if you make changes to the code, there's no need to reinstall the package since the changes are reflected immediately. Just be careful not to make changes when lidbox is running, because TensorFlow will use its autograph package to convert some of the Python functions to TF graphs, which might fail if the code changes suddenly.

Citing lidbox

@inproceedings{Lindgren2020,
    author={Matias Lindgren and Tommi Jauhiainen and Mikko Kurimo},
    title={{Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={467--471},
    doi={10.21437/Interspeech.2020-2706},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2706}
}

About

End-to-end spoken language identification out of the box. Rewrite in progress for first release (version 1).

License:MIT License


Languages

Language:Python 99.5%Language:JavaScript 0.3%Language:Makefile 0.2%