synesthesiam / voice2json-evaluate

Dataset evaluation scripts for voice2json

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

voice2json Evaluation

Automated system for evaluating voice2json on a dataset of WAV files with ground truth transcriptions and intents.

Dataset Format

Datasets are stored in the datasets directory with the following structure:

  • datasets/
    • <dataset name>/
      • profiles/
        • <profile name>/
          • bin/
            • Scripts to override voice2json commands (print-downloads, train-profile, transcribe-wav, and recognize-intent)
            • Arguments are separated by -- with preceeding arguments for voice2json and succeeding arguments for the command
      • wav/
        • Directory with WAV files to transcribe
        • Should match wav_name in truth.jsonl (e.g., wav/XXXX.wav)
      • truth.jsonl
        • jsonl file with one line per WAV file (ground truth)
        • Format from recognize-intent
        • wav_name key should be WAV path relative to dataset directory (e.g., wav/XXXX.wav)

Running

To get started:

$ git clone https://github.com/synesthesiam/voice2json-evaluate
$ cd voice2json-evaluate
$ make

After the virtual environment is created:

$ make run

When finished, check the results directory for a CSV summary file and reports for each profile. See the profiles directory for all downloaded and generated artifacts.

The evaluation is done using doit, so re-running make run after adding a new dataset or changing relevant files should only re-compute what's necessary. If you get stuck or want to start fresh, run make clean or delete the .doit.db file.

About

Dataset evaluation scripts for voice2json

License:MIT License


Languages

Language:Python 81.2%Language:Shell 18.2%Language:Makefile 0.6%