rhasspy / larynx

End to end text to speech system using gruut and onnx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Siwis good training on bad prompts

ddavout opened this issue · comments

in Siwis, the talent rarely respects the pronunciation of verbs in conditional mode
for example, she would say "il tirait" instead of "il tirerait " .. so

despite the correct phonemes

`DEBUG:larynx:Words for 'il tirerait le premier.': ['il', 'tirerait', 'le', 'premier', '.']
DEBUG:larynx:Phonemes for 'il tirerait le premier.': ['#', 'i', 'l', '#', 't', 'i', 'ʁ', 'ə', 'ʁ', 'ɛ', '#', 'l', 'ə', '#', 'p', 'ʁ', 'ə', 'm', 'j`

I can hear "il tirait le premier".

Is there enough of a pattern that we could automate some prompt corrections and re-train?

I have to compare the prompts I use with the original.. How many prompts do you need, you think ?

in parl, there are 4 occurrences of "rerai"
the first 3 are affected
text/part1/neut_parl_s01_0429.txt:
A défaut, je suggérerai à l’Assemblée de le rejeter.

text/part1/neut_parl_s02_0531.txt:
Cela représente, pour ceux qui l’ignoreraient, plus de deux fois le salaire moyen.

text/part1/neut_parl_s02_0589.txt:
Si le travail continue de cette manière, je me retirerai moi aussi.

text/part1/neut_parl_s03_0372.txt:
S’il nous rejoint, je retirerai mon amendement.

the only correct is
text/part1/neut_parl_s04_0597.txt:
Je les rencontrerai prochainement, probablement

I use Siwis as the "base" model for French, since it's one where I had the most data available. So any corrections to the transcripts will improve it and all of the downstream models when I re-train.

Should I create a repo to share the corrected transcripts, or would you like to do that?

Also, thanks for your effort :)

I have notice quite a lot of problems of "reading". For my voice I've just changed the prompts .. and yes it improved my voice particularly when the defaults are repeated, of course
other example "erion" on 9 occurrences I found, 3 are wrong

text/part1/neut_parl_s01_0633.txt: gagnerions vs gagnerons
Nous gagnerions beaucoup à examiner ce qui est pratiqué là-bas.

text/part1/neut_parl_s03_0462.txt: oserions vs oserons
Nous n’oserions pas, quant à nous, porter de telles accusations.

text/part1/neut_parl_s03_0622.txt:
Sans eux, nous ne serions pas là aujourd’hui, quoi que l’on pense, quoi que l’on dise.

text/part1/neut_parl_s04_0310.txt:
Nous souhaiterions savoir comment on peut faire.

text/part1/neut_parl_s04_0378.txt:
Je ne vois d’ailleurs pas comment nous le ferions…

text/part1/neut_parl_s06_0096.txt:
Certes, notre pays ne va pas aussi bien que nous le souhaiterions.

text/part1/neut_parl_s06_0666.txt: y is read as e (SAMPA)
Je crois que nous y gagnerions tous.

text/part2/neut_book_s06_0092.txt:
– Pourquoi serions-nous malades, puisqu’il n’y a pas de médecins dans l’île ? répondit très sérieusement Pencroff.

text/part3/emph_parl_s01_0633.txt: gagnerions vs gagnerons
Nous gagnerions BEAUCOUP à examiner ce qui est pratiqué là-bas.

a repo is a good idea, right now I am putting a lot effort to chase all these imperfections,
.. contrary to you who are helped by larynx, I am obliged to track more subtle differences (as with a festival lexicon, 1 (word, POS) corresponds to 1 entry in the lexicon) and I need to take in account every liaison she makes compulsory, optional or completely wrong
There are parts that are not read at all (at least ... one part between parenthesis) and ...
they are waves files are not good enough (in my mind) for Festival (particularly I would say badly truncated ones with a script not suitable for French Phoneset ... It's my "feeling" but one fact is here : the sound i + k is very weak .. I look the waves with Praat and I am with time more and more selective ... but that's another problem

If it would help you out, I have the prompt alignments too. I trained a French Kaldi model on these same IPA phonemes, and used the alignments in the training labels and to trim the WAV files.