rhasspy / larynx

End to end text to speech system using gruut and onnx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems pronouncing times and dates

fquirin opened this issue Β· comments

It looks like the English and German voices fail to pronounce dates and times like this (only ones I've tested):

English: 4/23/2021, 5:02:54 PM
German: 23.4.2021, 17:02:51

I know this is a widely discussed problem in the TTS field and not so easy to solve, but maybe there is some smart python library that does the work ;-). A small script using regular expressions could be a start, but to make this work for every language there has to be some ML based procedure I guess.

Maybe you are already working on something? ^^

This would be a great enhancement to add to gruut πŸ™‚

I'm not currently working on something for this, so I'd be interested to hear from anyone about approaches that have been tried. My first instinct would be to add date and time regular expressions to each language.yml file, and then use a combination datetime.strftime and num2words to produce the text.

It'll be important to understand how dates and times are commonly pronounced for each language. Beyond month/day ordering, it matters if it should be e.g., "23 April" or "April 23rd". Any resources on those kinds of things would be very helpful!

In German this topic is a little nightmare since all the pronunciations depend on what is happening around the date or time and if its singular or plural etc. πŸ™ˆ .

All TTS developers arrive at this point sooner or later I guess and find some way to handle it. I'm not sure if the Mozilla people already thought about it? In Mary-TTS it was implemented somehow ... I think ... let me double-check.

I think the most sophisticated way would be to approach this like an NLU problem. Collect a large number of sentences, mark the date and time, define the correct replacement, create some magical ANN, cast magic, let the system learn the rules 😝 πŸ˜…
The easy way to get started could be to collect a reasonable number of sentences and program a few hand-made rules ^^.
Espeak is not very good with dates and time etc. but at least you can understand the result. It simply replaces "24.04.2021" with "twenty-four dot four dot twothousand-twenty-one".

I just double-checked Mary-TTS behavior and its kind of funny ^^. I think they let Java reformat the date because if you use:

Today is the 04/23/2021 at 5:02 PM. it gives "Today is the April the 23rd twenty-twenty-one at five oh-two PM"

and if you flip the numbers like this:

Today is the 23/04/2021 at 5:02 PM. you get ... wait for it ... "Today is the November fourth twenty-twenty-two ..." πŸ€“

... well I guess its better than nothing :-D

[UPDATE]:

Regular expressions for English in Mary-TSS seem to be rather strict and simple:

\d{2}/\d{2}/\d{4} -> parse date
\d{1,2}:\d{2}\s{0,1}(AM|PM) -> parse time

[UPDATE 2]:

5:02 p.m. becomes "five oh-two AM PM" in Mary-TTS πŸ˜†

If you point me in the right direction I could maybe spend some time playing with regular expressions and parsing for German and English :-)

Some inspirations from Mary Java test code: link

I've made a first pass at this in a side branch of gruut. It's pretty simplistic, but I'm curious what you think.

For each language (currently just U.S. English), there is a set of datetime pattern triplets. Each triplet contains:

  1. A regular expression for matching the date or time in raw text
    • Groups in the match are joined with a space, so (\d{1,2})/(\d{1,2}) on 4/10 would become 4 10
  2. A strptime format string that parses the match from stage 1
    • Something like %m %d to parse <month> <day>
  3. A strftime format string to turn the parsed datetime into words and numbers
    • Something like %B %d_ordinal to make April 10_ordinal

The _ordinal is special in gruut when "number converters" are enabled (--number-converters). Using the num2words package, gruut will transform 1_ordinal into "first", 2_ordinal into "second", etc. A _year converter is also available, which turns 2021 into "twenty twenty one".

I have some basic time examples working too (e.g., 5:30 and 5:30 PM). It should be pretty easy to extend this approach to German and other languages.

What do you think of this approach as a first pass?

First impression is good πŸ‘ . I'm just a little bit concerned about the question of how to distinguish 4/10 = 10th of April from 4/10 = 4 over 10 and in German 4.10 = vierter Oktober from 4.10 = Vier Komma Eins Null.

Is it possible to run tests in a very basic environment (without installing all the big libraries)? I could clone the lib and create some test sentences.

[EDIT]
I was just thinking of these German sentences again:
"Es passiert am 4.10.2021" -> "Es passiert am vierten zehnten zweitausend zwanzig" (it happens at the ...)
"Es ist der 4.10.2021" -> "Es ist der vierte zehnte zweitausend zwanzig (it is the ...)
In both cases the result is different :-(

First impression is good +1 . I'm just a little bit concerned about the question of how to distinguish 4/10 = 10th of April from 4/10 = 4 over 10 and in German 4.10 = 4th of October from 4.10 = 4 point one zero.

Ah, I've thought of that! Kind of a cop-out, but 4/10 and 4.10 will be interpreted as numbers (though 4/10 isn't working just yet). However, 4/10_date and 4.10_date will be interpreted as dates.

Is it possible to run tests in a very basic environment (without installing all the big libraries)? I could clone the lib and create some test sentences.

Sure, give me some time and I can get a small script together.

I was just thinking of these German sentences again:

This seems a bit to me like the cardinal vs. ordinal difference for numbers ("one" vs. "first"). I get around this in gruut by having a default (cardinal in this case), and then having a _ordinal suffix otherwise.

Any idea what these different sense of a date are called?

Ah, I've thought of that! Kind of a cop-out, but 4/10 and 4.10 will be interpreted as numbers (though 4/10 isn't working just yet). However, 4/10_date and 4.10_date will be interpreted as dates.

I didn't quite get that, where is this suffix _date generated? Automatically by gruut (some sort of tokenizer)? Because I don't think we can expect the user to add it manually :-/

Any idea what these different sense of a date are called?

Its a result of the declension of words which is basically non-existent in English (except for singular<->plural) πŸ™ˆ

4.10.2021 (10/4/2021) can be:

"the fourth of October" -> "der vierte Oktober" (Nominativ)
"morning of the fourth of October" -> "am Morgen des vierten Oktobers" (Genitiv)
"at the fourth of October" -> "am vierten Oktober" (Dativ)
"I'm thinking of the fourth of October" -> "Ich denke an den vierten Oktober" (Akkusativ)

As someone from a land where 4/10/2021 is "the fourth of October, Twenty-Twenty-One", locale-aware handling of dates is a plus... :)

Here's a couple of thoughts on the topic...

Input vs Output Locale

One aspect of this issue that I think hasn't yet been mentioned is that locale-aware can be split into:

  1. locale of input text (e.g. I'm reading a web site with dates written with hard-coded MM/DD/YYYY-style format.)
  2. locale of output speech (e.g. do I want the output pronounced as presented in the input text or as my locale convention?)

While my initial gut feeling is that I'd always want output based on my locale, on reflection it may depend on the content. e.g. if I'm reading a novel maybe its truer to the text for the American character to say "I'm looking forward to July Fourth Twenty Twenty-Two".

Limited Set of Valid/Ambiguous Dates

In terms of date parsing, the other aspect to keep in mind is that there's only a limited number of valid dates (or equally, a limited number of ambiguous dates). So, for example, if a possible date is "20/6/2021" then it can un-ambiguously only be "20 June 2021".

So perhaps a little more "intelligence" is also possible in terms of parsing--and maybe correct parsing is even possible in more circumstances: e.g. when a non-ambiguous date is encountered before an ambiguous date to select the correct format option for a specific document that contains some ambiguous dates.