wasertech / verified_sentences_stt

Verified sentenced for STT uses, by Data Manger.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Verified Sentences for STT

Verified sentences for STT uses, by Data Manager.

Small sample

Using the Training Wizard for STT,

Users are presented with some sentences from the NLU data.

Their role is to determine if the sentences can be used to train a language model for STT.

If a sentence is usable, then it is accepted (true += 1). Else it is rejected (false += 1).

At the end of the process users can share their results.

They look like so.

# reviews.toml

["can you lookup something for me please"]
true = 1
false = 0
lang = ["en",]

["answer me this"]
true = 1
false = 0
lang = ["en",]

[ddgr]
true = 0
false = 1
lang = ["en",]

To find if any sentence is valid, get the boolean with the highest count.

Checkout reviews.toml to see everything.

lang is a list of languages the sentences has been or not verified

Usage

This data can be used to quickly train a language model for STT applications (such as Assistant). It can also be used to train a classifier to automate even more the process.

Using the Training Wizard, you can pull requests automatically once you are done validating your data.

About

Verified sentenced for STT uses, by Data Manger.

License:GNU General Public License v3.0