Raw waveform adaptation with SincNet

This repository contains code for our ASRU 2019 paper titled "Acoustic model adaptation from raw waveforms with SincNet". The aim is to explore the adaptation of the SincNet layer (filter parameters and amplitudes) to speakers and domains.

The code is a little messy. I hope to clean it up soon, time permitting. Any questions or problems - please get in touch.

Much of the code is built on the work by Ondrej for Learning to Adapt.

This work is the result of a collaboration with my co-authors Ondrej Klejch, Erfan Loweimi, Peter Bell, and Steve Renals.

Dependencies

The code has been run with:

Keras 2.2.2
Tensorflow 1.10.0
PyKaldi
Kaldi

Usage

For training from scratch see experiments/ami/train_sinc_40_flat_6epochs.sh. For speaker adaptation see experiments/ami/adapt_pfstar_40_flat_speaker_lhuc0+sinc.sh. The layers to be adapted (LHUC0, LHUC1, LHUC0+Sinc, etc.) can are determined by an argument to adapt_pfstar_40_flat_speaker.py. The above scripts assume an existing tri3 model of AMI (or a different dataset). It will also look for pdf_counts in the main directory, which is equivalent to e.g. tri3/final.occs.

Citation

For research using this work, please cite:

@inproceedings{Fainberg2019,
  author={Joachim Fainberg and Ondřej Klejch and Erfan Loweimi and Peter Bell and Steve Renals},
  title={{Acoustic Model Adaptation from Raw Waveforms with SincNet}},
  booktitle={ASRU},
  year=2019
}

References

Our work builds on a paper by Ravanelli and Bengio. They have a SincNet implementation for PyTorch.

About

Raw waveform adaptation with SincNet

asr adaptation waveform sincnet

Languages

Language:Python 77.9%Language:Shell 22.1%