WenwanChen / pvad

speaker conditioned voice activity detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pvad

speaker conditioned voice activity detection replicated from https://arxiv.org/abs/1908.04284

image

Classifier: {non-speech, target speaker, and non-target speaker}

  1. Synthetic dataset generation
    prep4kaldi.sh
    flac_to_wav.sh
    concat.sh concat.py
    augment.py

  2. Prepare target speaker embeddings
    extract_embeddings.py

  3. Extract features and labels
    correct_target_labels.py
    fbank.py
    feature_labels.py

  4. Data loader
    dataloader.py
    dataloader_test.py

  5. Model definition and traning
    pvad_training.py

  6. Saved model
    checkpoint_oct22_coswarm.t7

  7. Test
    test.py

About

speaker conditioned voice activity detection


Languages

Language:Python 90.2%Language:Shell 9.8%