rhasspy / larynx

End to end text to speech system using gruut and onnx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sound was lost in french word rez-de-chaussée

alt131 opened this issue · comments

Try to get audio for french word: rez-de-chaussée
Here's command line:
cat << EOF |
fr|rez-de-chaussée.
EOF
/usr/local/bin/larynx
--debug
--csv
--glow-tts /path/fr-fr/siwis-glow_tts
--hifi-gan /path/hifi_gan/universal_large
--output-dir /mnt/d/99/voices/
--language fr-fr
--denoiser-strength 0.001

Debug data:
DEBUG:larynx:Words for 'rez-de-chaussée': ['rez-de-chaussée']
DEBUG:larynx:Phonemes for 'rez-de-chaussée': ['#', 'ʁ', 'e', 'd', 'ʃ', 'o', 's', 'e', '#', '‖', '‖']
Phonemes is OK for this word but there is not sound 'd' in an output audio.

The same situation with word "banc"
DEBUG:larynx:Words for 'banc': ['banc']
DEBUG:larynx:Phonemes for 'banc': ['#', 'b', 'ɑ̃', '#', '‖', '‖']
The sound 'b' was lost.

gomme
DEBUG:larynx:Words for 'gomme': ['gomme']
DEBUG:larynx:Phonemes for 'gomme': ['#', 'ɡ', 'ɔ', 'm', '#', '‖', '‖']
The sound 'ɡ' was lost.

Another situation with 'fille'
DEBUG:larynx:Words for 'fille': ['fille']
DEBUG:larynx:Phonemes for 'fille': ['#', 'f', 'i', 'j', '#', '‖', '‖']
Phonemes is OK but at the end laryx adds an additional sound 'e'. It's strange.

DEBUG:larynx:Words for 'livre': ['livre']
DEBUG:larynx:Phonemes for 'livre': ['#', 'l', 'i', 'v', 'ʁ', '#', '‖', '‖']
'ʁ' sounds like 'a' but it should be as 'r'

DEBUG:larynx:Words for 'table': ['table']
DEBUG:larynx:Phonemes for 'table': ['#', 't', 'a', 'b', 'l', '#', '‖', '‖']

DEBUG:larynx:Words for 'stylo': ['stylo']
DEBUG:larynx:Phonemes for 'stylo': ['#', 's', 't', 'i', 'l', 'o', '#', '‖', '‖']

In both cases the sound 't' was lost.

I think I know where is problem, I took phonemes.txt for siwis-glow_tts from kathleen-glow_tts and more words are sound correctly. Please check it.

It seems related to surrounding words. If you have it say "la table" or "le banc", then the "t" and "b" sounds come through. I'm not sure about "rez-de-chaussée", though. If I modify the lexicon to have the pronunciation "ʁ e d d ʃ o s e", then I hear the "d" sound.

I don't think the surrounding words should affect on pronunciation if there is a pause between words. I think it's some global bug in neural network training algorithm if we say about "table" or "banc".
About the word "rez-de-chaussée". There are not much words with hyphen in the French language but there are a lot of phrases like "qu'est-ce que c'est", "Y a-t-il", "êtes-vous?", "sont-ils?"etc
And if you just double 'd' it solved that problem but you can get problems in some other phrases.

This might be partially related to #7

Without doing between-word stuff like liasons explicitly in gruut, the model is forced to figure out how to blend across word boundaries (#). I may need some help from a native speaker to understand what needs to really be done here.

I don't agree here. I think the liason can be fixed in gruut but this issue didn't have relation with gruut because gruut gave the right pronunciation for all my examples.

I don't agree on the "fact" that There are not much words with hyphen and as it's a way used to create new words, they should not be neglected..

The pronunciation of some may seem "weird" as the liaison is done despite the hyphen
I am working on a Festival Siwis voice ... j'ai pas mal d'exemples sous le coude pour tester ma voix
the first one regarding hyphen is 'porc-épic'

and your 'porc-épic' is wrong

DEBUG:larynx:Phonemes for 'porc-épic': ['#', 'p', 'ɔ', 'ʁ', 's', 'e', 'e', 'p', 'i', 'k', '#', '‖', '‖']

We have a lot of "noms propres" in this case. 2 examples
"Pont-à-mousson" with liaison despite the "-" and the nomal POS de "à"
"Bourg-en-Bresse" with liaison despite the "-" and LIA often said as a 'k'

@ddavout, I suggested to use Liason for all words with hyphen if there is no other solution. You didn't agree with it, did you?

Good to ask ;) .. there is a misunderstanding here (my fault, I have lost command of my English :( )
I wanted to say that you should not overlook the importance of hyphens when you study liaison !

I am on your side ! I once was maybe extreme in my thinking, I thought than you can train the voice to make it understand what is a liaison. ( (I use Festival and lts rules)
I was putting between every suitable couple of word "artificial" word of pos "LIA" ( for liaison) , say "tflo" between 'dit' and 'on' and put dit-on in my lexicon .. t the ending of dit, followed by an utf8 char we never use (normally), and o : initial of the second word

It was not so bad, But as my POS became more reliable, (and my understanding of the Token and POS module, a little be less weak) I thought I could elaborate another strategy
but mind, unlike you .. I'm sure :) I am a French old lady with a lot time ... don't expect me to talk about neuronal science ..
I am happy to make my neurons work (even if they are slower now..), to use my still good ears ... etc.

My methods are time consuming.. I will enforce the so called compulsory liaison rules and propose safe ones.
Between us, there is nothing like a liaison rule .. French people love exceptions ! it's why they tolerate pseudo-rules.
Would you say "prix extrême" with a z phoneme ? no. Even if you are used to say it when prix is plural.

I'm tracking every liaison in the Siwis prompts to check my "rules", I have not yet finished job ...

euh .. to come back to hyphen matter.
I have got a list of what I call "locution" needing 1 hyphen or that could logically use one, a list with 2 to 3
locution=ready made expression with or without hyphen
ex: nuit et jour : liaison t
ex: the expression "curriculum vitae" would be read properly with a single entry, the same than the one for "curriculum-vitae" in case somebody else (than me) write it with an hyphen ..

I am not sure if I'm clear, so I'll stop now.
But before just a word .. Yes I think I should have said nothing more in fact ..
To have some success with my clustergen voice, I didn't follow the "English/American" diktat :) _ but I have not yet convinced anybody :)_

**hyphen is not just a punctuation sign, it's a letter" ..

like the French apostrophe is not a whitespace

I think to check every word sequence it's very time-consuming way.
I described some pseudo-rules for liason here:
#7 (comment)
and there
#7 (comment)
the additional rules was described by @tjiho.
I think it's possible to add some additional rules + black & white lists and get an accuracy about 95-99% I believe.

And I agree now to use a liason always when hyphen is encountered without pseudo-rules it's not very good idea.

PS. Hyphen, apostrophe are symbols as whitespace. They are not letters.

It's was an experiment to see if a voice can be trained to understand something to this "nebulous" matter the French liaison, experiment done at a time where my POS module didn't give "satisfactory" results. (at all)

No I have come back to rules, that I call exception and exclusion, corresponding probably to your pseudo rules with black and white lists.

For the running part of the voice, I will only enable compulsory liaison (without forgetting the case of locutions marked or not by hyphen).

I have not yet finalized to make it simpler, for now my rules have several parameters (the writing form of the word leading to the liaison and of its follower, their respective' POS. So it's just a compilation work.

I don't know your technology, but if it allows to have good phonemes without knowing very precisely the liaisons made in the prompts (compulsory, optional or wrong), the impressive work https://github.com/juliacarbajal/french_phonologizer/blob/master/phonologize.py should be, IMHO, a better guide.

Tell me how to use your French model https://github.com/rhasspy/gruut/releases/tag/v0.10.0, then I may be able to test your liaison solution as with different POS detection, our respective lists are likely to be very different.

Mine is not very good I must say to spot inversions ( particularly inversion for style effect) and there is still work to do to take in account the frequent spelling mistakes (ex: confusion between hyphen and apostrophe, missing hyphen etc.)

Sorry, I'm not the author of gruut, I'm just an user like you. You need to ask the developer.

Reassure me, you do use it... without POS, you can't apply your liaison rules...

If last symbol of first word is 's', 'x', 'z', 't', 'd' and first symbol of second word is h or any vowel

I am not sure of what you call a symbol and you don't agree with my concept of letter (you are not the only one...) but ...
your list look to me as incomplete, personally I consider 'c' 'q' 'k' 'g' 'd' 'x' 's' 'z' 'n' 'r' 'p' 'y' 'f' ?

it may look to you as excessive but I've meet examples in all these cases and I will know later if I have interest to treat some as exceptional
Do you include 'y' in your list of vowel ?

Yes, "first letter" will be more accurate.
It's not a final list and I think about using only for phrases like "word1_space_word2". For phrases like "c'est" and any other with apostrophe I thought about using liason always. I also didn't decide how to process "neuf heures" when f->v.
Yes, I'd like to see some exceptions.
Yes, 'y' is included in list of vowels.

For phrases like "c'est" and any other with apostrophe I thought about using liason always
I am not sure to follow you on that point.
...word1_space_word2...
I distinguish what I call, probably wrongly, locution, association of upto 3 words, that can be at least in theory replace by a single word, i.e I can without doubt attribute a POS, I declare them in my PosLex, and they have an entry in my lexbook, if .. as often, they don't follow the ordinary rules of pronunciation, or liaison . or can be personalized (ex without liaison latin locution curriculum vitae, )
that's true that the transformation f->v is not so current, and at running time raise a very bearable mistake. Personally, II am not sure if I am wrong to say 'neuf années' without phoneme v, and I have not even think to check what our 'Academiciens' have ruled :)

...word1_space_word2...

Two words must be separated only by space (not an apostrophe, not a hyphen etc). The rules is only for that case. For hyphen it's other rules etc. Maybe they can be combined I don't know yet.

If it's possible I prefer to work with 2 words at once because 3 words will give much more options.

I can without doubt attribute a POS, I declare them in my PosLex

Are you sure PosLex is 100% accurate?

My Poslex is not 100% accurate, far from it :)
Time to times we disagree, I need to bring some corrections. Last exemple, I've not yet solved:

"à moins d'être..." 'être' is seen as a verb, not a big deal... but for "à moins d'interviewer" the fault is really audible

to come back to apostrophe and hyphen ..
@alt131 , you said
PS. Hyphen, apostrophe are symbols as whitespace. They are not letters.

but don't you feel the need to follow your own tokenizer ?

echo "l'amour rend aveugle"| python3 -m gruut fr-fr tokenize | python3 -m gruut fr-fr phonemize {"id": "", "raw_text": "l'amour rend aveugle", "raw_words": ["l'amour", "rend", "aveugle"], "clean_words": ["l'amour", "rend", "aveugle"], "tokens": [{"text": "l'amour", "pos": "NOUN"}, {"text": "rend", "pos": "VERB"}, {"text": "aveugle", "pos": "ADJ"}], "clean_text": "l'amour rend aveugle", "sentences": [{"raw_text": "l'amour rend aveugle", "raw_words": ["l'amour", "rend", "aveugle"], "clean_words": ["l'amour", "rend", "aveugle"], "tokens": [{"text": "l'amour", "pos": "NOUN"}, {"text": "rend", "pos": "VERB"}, {"text": "aveugle", "pos": "ADJ"}]}], "pronunciations": [["l", "a", "m", "u", "ʁ"], ["ʁ", "ɑ̃"], ["a", "v", "œ", "ɡ", "l"]], "pronunciation": [["l", "a", "m", "u", "ʁ"], ["ʁ", "ɑ̃"], ["a", "v", "œ", "ɡ", "l"]], "pronunciation_text": "l a m u ʁ ʁ ɑ̃ a v œ ɡ l", "mapped_phonemes": {}}

"l'amour" is seen as as "clean_word" and a word is *composed" of letters, isn't it ?

By the way, I doubt you will be able to apply fine rules using POS without working on the tokenizer beforehand
just an example, in the sentence 'non-désiré par sa mère, il est resté le mal-aimé',
you got
{"text": "non-désiré", "pos": "PROPN"}

without hyphen, you got (IMHO a better)

{"text": "désiré", "pos": "VERB"}

It depends. You can see on it from 2 points of view: the linguistics and NLP (natural language processing).
From position of NLP, "l'amour" and "non-désiré" are 3 words (article+apostrophe+amour and non+hypen+désiré), after parsing an original sentence you can use a post processing and combine "non-désiré" in single word, but I will not do it for "l'amour".

The author of gruut can have his own opinion about it. And we have different goals. He wants to process a text to speech and for him to work with "l'amour" or "I've" (I have) as one word it's easy way. I need to separated them on single words and maybe later combine them in phrases because it's more comfortable from the point of view of translation.

PS. "l'amour" and "I've", they are not very good examples here. There are a lot of word combinations in French when the dictionary will become huge. Like these: je t'ouvre; la légende s'écrire; vous n'allez pas m'envoyer au bagne...

The size of the dictionary doesn't frighten me; once my LTS is trained accordingly, the lexbook will dramatically shrink.
I am more concerned about the POSlex, but I will *help" it to recognize "l'amour" the way I want ..
but everything as a price.
If I want a reliable POS ... to have simpler run-time liaison rules .. but I understand your point of view .. I am often excessive...
and may become more reasonable with ""l'amour"... I will think about it, but for "s'" "m'" "n'" etc... I will not move.

the point of view of foreign language helped to take this decision . Why in English I can use a single word, and I've to use 2 in French ?
and I am happy when I've got straight away something like that

id _5 ; name il ;  pos PRO:per ; pbreak NB ; liaisonvocalic no ;
id _6 ; name s ; pos CON ; pbreak NB ; liaisonvocalic no ;
id _7 ; name s_en ; pos PRO:ind ; pbreak NB ; liaisonvocalic yes ;
id _8 ; name amuse ; pos VER ; pbreak BB ;

( name s ; pos CON is mute, I keep it to not disturb the POSlex I trained a long time ago... gruut doesn't have this legacy problem).

I would have thought that, from the point of view of translation, see "n'allez" as a verb will help... but apparently I'm wrong.

You work only with French but I work with several languages so I always separate words with apostrophe, hyphen etc and then I use a post-processing for phrases like these too

"tout de suite", "salle de bains", "salle à manger"

as a single entity because this often translates as a single word or a similar phrase.

Sorry to have been out of this discussion for a while. I'm getting close to a refactored release of gruut (in the refactor branch) as version 1.0. The tokenizer/phonemizer code has been simplified, and a lot more tests have been added.

The French liason code is here. It only handles a handful of cases, but it will hopefully provide a good start.

Regarding apostrophes and hyphens: gruut's tokenizer has a set of "punctuations" that vary by language (here is the French set). Text is split into tokens by whitespace first, and then further split by punctuation characters (except for some special cases like numbers). The goal is for the final token to be something present in the lexicon.

The final set of tokens are run through a POS tagger model that was trained on the Universal Dependencies CONLLU files for each language (my French model was trained on the upos label). If there's a misalignment between the tokenizer and this model, it could definitely cause problems for the liason code.

Maybe at least the hyphen should be a "punctuation" character for French, so that those words get split into multiple tokens?

Maybe at least the hyphen should be a "punctuation" character for French, so that those words get split into multiple tokens?

I don't know. Even if word with hyphen doesn't have a liason it will pronounce a little faster than 2 single words.

Actually, I take that back. The hyphen shows up in the lexicon as part of words, so it needs to be left in.

If you process a hyphen as punctuation character how do you plan to process a liason in this case?

I decided not to process the hyphen as a punctuation character. It would make things too difficult.

"Will you go so ad far as to say
**hyphen is not just a punctuation sign, it's a letter" .. ;)

From gruut's point of view, yes, a French hyphen is a letter ;)

i's a pity that we use the same sign to cut the words at the end of a line.