nabinbhandari / itnpy

A simple, deterministic, and easily extendable approach to inverse text normalization (ITN) for numbers.

Home Page:https://pypi.org/project/itnpy/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inverse Text Normalization

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.

Overview

This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not change word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.


These examples were produced by running this script.

Installation

This package supports Python versions >= 3.7

To install from pypi:

$ pip install itn

To install locally:

$ pip install -e .

Tests

To run tests, use pytest in the root folder of this repository:

$ ls
LICENSE			assets			scripts			src
README.md		requirements.txt	setup.py		tests

$ pytest

Issues

This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!

Citation

If you find this work useful, please consider citing it.

@misc{hsu2022itn,
  title        = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
  author       = {Brandhsu},
  howpublished = {https://github.com/Brandhsu/itnpy},
  year         = {2022}
}

About

A simple, deterministic, and easily extendable approach to inverse text normalization (ITN) for numbers.

https://pypi.org/project/itnpy/

License:MIT License


Languages

Language:Python 100.0%