Here we give samples of eSpeak generated audio, using eSpeak's internal phonetic descriptions. The text, phonetic description, and audio output are given. Our agents use eSpeak's phoneset, which we convert to IPA for display (using lexconvert).
- "Hello World":
hələʊ wəːld
Here we give samples of Tacotron 2 + HiFi-GAN generated audio.
- "Hello World":
HH AH0 L OW1 W ER1 L D
Here we vary the first (s1) attribute and leave the other attributes constant
- s1 = 0:
ɡoikiksss
- s1 = 1:
sikiksss
- s1 = 2:
iikiksss
-
s1 = 3:
iiksssss
-
s1 = 4:
aikkssss
Target word | Ground truth | Predicted phones |
---|---|---|
Up | ʌp |
ʌvb |
Down | daʊn |
daʊ |
Left | lɛft |
lɛ |
Right | ɹaɪt |
ɹaɪʃjəːn |