castorini / howl

Wake word detection modeling toolkit for Firefox Voice, supporting open datasets like Speech Commands and Common Voice.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streamline preprocessing pipeline

daemon opened this issue · comments

Data preprocessing is currently split into multiple steps, i.e.,

  1. Download the datasets (where?).
  2. Run run.preprocess_dataset.
  3. Write the corresponding *.lab files using run.export_mfa.
  4. Download Montreal Forced Aligner (MFA) and the corresponding CMU phonetic dictionary.
  5. Run MFA (mfa_align) over the speech corpus.
  6. Convert the output TextGrids to our jsonl format (run.attach_mfa_alignment).

We should make this process easier and document it somewhere.