KevinEloff / learning-to-speak

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel

eSpeak example audio

Here we give samples of eSpeak generated audio, using eSpeak's internal phonetic descriptions. The text, phonetic description, and audio output are given. Our agents use eSpeak's phoneset, which we convert to IPA for display (using lexconvert).

  • "Hello World": hələʊ wəːld

Tacotron 2 + HiFi GAN example audio

Here we give samples of Tacotron 2 + HiFi-GAN generated audio.

  • "Hello World": HH AH0 L OW1 W ER1 L D

Tacotron samples

Here we vary the first (s1) attribute and leave the other attributes constant

  • s1 = 0: ɡoikiksss
  • s1 = 1: sikiksss
  • s1 = 2: iikiksss
  • s1 = 3: iiksssss

  • s1 = 4: aikkssss

Grounded one-word audio samples

Target word Ground truth Predicted phones
Up ʌp
ʌvb
Down daʊn
daʊ
Left lɛft

Right ɹaɪt
ɹaɪʃjəːn

About