naubull2 / vpunct

An English auto punctuator for voice recognized texts

Voice Input Punctuator

Tries to automatically puctuate voice recognized text

Data

Datasets are from babi tasks data and the intent classifier dataset

Simple questions V2
movie dialogue task
NLU benchmark dataset

Requirements

python3
- spacy, sanic, keras

Howto

You can either tag [.!?] based on pattern rules only, otherwise use a neural network model

python punctuate.py -m [neural|pattern]  // defaults to pattern, in which case you won't need keras
python punctuate.py -e   // evaluate on a movie dialogue dataset

Performance

The following evaluation is performed on the movie dialogue task corpus.

Toal number of lines : 136771
Number of lines with question marks : 45122
Pattern Only

Accuracy  : 0.7358669903196561
Precision : 0.8589784517158818
Recall    : 0.23853109347989895
F1 Score  : 0.3733782002358982

NNet Only

TBD

Joint Model

TBD

References

About

An English auto punctuator for voice recognized texts

GNU General Public License v3.0

Languages

Language:Python 94.2%Language:Shell 5.8%