ivanvovk / text-frontend-tts

Multilingual text processing API for cleaning, IPA word phonemization, tokenization, translating into sequence of character IDs for easy stacking with neural Text-to-Speech models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text-frontend-tts

Multilingual text processing API for cleaning, IPA phonemization, tokenization, translating into sequence of character IDs for easy stacking with neural Text-to-Speech models.

1 Installation

Supported OS type: Unix (only)

Package provides simple installation:

  • Clone the repo git clone https://github.com/ivanvovk/text-frontend-tts.git
  • Get into the root cd text-frontend-tts
  • Run sh install.sh. The script will:
    • Install all necessary Python dependencies
    • Initialize phonemizer submodule
    • Download and install G2P backends: espeak-ng, festival, mbrola, which are necessary to make phonemizer work
    • Install phonemizer as Python package
    • Install text_frontend as Python package

2 Usage

API is devoted for neural TTS systems text inputs preprocessing (i.e. getting the sequence of character embedding ids). Package supports grapheme and phoneme text representation. (Note: grapheme processing doesn't support word stressing, whereas phoneme processing does)

Code examples:

Import:

from text_frontend import TextFrontend

Initialization:

# Encodes grapheme inputs
tf = TextFrontend(text_cleaners=['basic_cleaners'], use_phonemes=True, n_jobs=1, with_stress=False)

To get the number of supported characters to know how many embeddings to initialize in your TTS neural network (note: current API supports only IPA phoneme scheme):

tf = TextFrontend(use_phonemes=False)  # if using graphemes for encoding
print(tf.nchars)
# Output: 119

tf = TextFrontend(use_phonemes=True)  # if using phonemes for encoding
print(tf.nchars)
# Output: 236

Text encoding:

# Encodes grapheme inputs
tf = TextFrontend(text_cleaners=['english_cleaners'], use_phonemes=False)

text = "Mr. User, this is test sentence to check the performance of phonemizer and text-to-sequence encoding."
print(tf.graphemes_to_phonemes(text, lang='en-us'))  # it still can make G2P
# Output: "m_ˈɪ_s_t_ɚ_._ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_."

sequence = tf.text_to_sequence(text, lang='en-us')
print(sequence)
# Output: [36, 32, 42, 43, 28, 41, 2, 44, 42, 28, 41, 5, 2, 43, 31, 32, 42, 2, 32, 42, 2, 43, 28, 42, 43, 2, 42, 28, 37, 43, 28, 37, 26, 28, 2, 43, 38, 2, 26, 31, 28, 26, 34, 2, 43, 31, 28, 2, 39, 28, 41, 29, 38, 41, 36, 24, 37, 26, 28, 2, 38, 29, 2, 39, 31, 38, 37, 28, 36, 32, 49, 28, 41, 2, 24, 37, 27, 2, 43, 28, 47, 43, 6, 43, 38, 6, 42, 28, 40, 44, 28, 37, 26, 28, 2, 28, 37, 26, 38, 27, 32, 37, 30, 7, 1]

print(tf.sequence_to_text(sequence))  # however encoding corresponds only to grapheme representation
# Output: "mister user, this is test sentence to check the performance of phonemizer and text-to-sequence encoding."
# Encodes phoneme inputs
tf = TextFrontend(text_cleaners=['english_cleaners'], use_phonemes=True, with_stress=True)

text = "Mr. User, this is test sentence to check the performance of phonemizer and text-to-sequence encoding."
print(tf.graphemes_to_phonemes(text, lang='en-us'))
# Output: "m_ˈɪ_s_t_ɚ_._ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_."

sequence = tf.text_to_sequence(text, lang='en-us')
print(sequence)
# Output: [153, 45, 42, 225, 89, 135, 127, 122, 137, 89, 5, 135, 76, 159, 42, 135, 159, 137, 135, 225, 87, 42, 225, 135, 42, 87, 165, 225, 77, 165, 42, 135, 225, 77, 135, 55, 87, 160, 135, 76, 77, 135, 147, 89, 38, 83, 153, 77, 165, 42, 135, 104, 139, 135, 38, 123, 165, 153, 217, 137, 89, 135, 133, 165, 151, 135, 225, 87, 160, 42, 225, 6, 135, 225, 77, 6, 135, 42, 141, 160, 35, 77, 165, 42, 135, 158, 40, 160, 123, 151, 159, 40, 7, 1]

print(tf.sequence_to_text(sequence))  # encoding corresponds to phoneme representation
# Output: "m_ˈɪ_s_t_ɚ_ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_."

Just cleaning the text:

from text_frontend import clean_text

text = "Mr. User, this is test sentence   to check the performance of text cleaning. It costs $0."
print(clean_text(text, ['english_cleaners']))
# Output: "mister user, this is test sentence to check the performance of text cleaning. it costs zero dollars."

For more details read the docs when calling functions.

About

Multilingual text processing API for cleaning, IPA word phonemization, tokenization, translating into sequence of character IDs for easy stacking with neural Text-to-Speech models.

License:MIT License


Languages

Language:Python 95.8%Language:Shell 4.2%