Quick demo showing how to use Azure Speech Synthesis (tts) from Unity and loading/playing the generated file in game.
The script/logic is in Assets\Scripts\AudioGenButton.cs
which is used by the Button Manager (inside AudioGenUI > Plane > Canvas
)
The Azure Speech Service API key must be set the UI for the code to work.
The Unity Speech SDK must be installed manually in your project and is available at https://aka.ms/csspeech/unitypackage
The SDK was too big for GitHub and you need to install it manually, Window > Package Manager > + icon and select Add package from Disk
Then pick the zip file you just downloaded with the latest speech package.
You can setup a free sandbox and create an Azure Speech Resource to get your API key, follow these instructions to get setup:
The tts settings can be configured by modifying the config opjects in the script. Here is a list of supported languages and voices: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support
TTS concepts and options are explained here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/index-text-to-speech
Make sure to set your Azure Speech Resource Key in the Script attributes.
To access the inspector panel, you have to click on the button manager which you can find in the Hierarchy pane shown to your left. The inspector shows the AudioGenButton script with its public members. The Subscriber key and region can be set there.
Note that an Audio Source is passed to the script and the provided audio source uses a mixer you can customize (uneeded for this demo).
The logic is pretty straightforward
- Our script adds a click event handler for the UI button (in the
Start()
function) - when the button is pressed,
OnButtonPressed()
is called, which sets up the configuration and caches it (this snippet should really be moved toStart()
) - The configuration sets the path we want the Azure tts Speech API to use when generating the audio file
- We trigger the text generation and wait for it to be done (an audio file is created in the user's asset folder)
- if everything went well, the file is loaded in an audio clip (via the custom async
GetAudioClip
function) - the audiosource is stopped and the new clip is played at full volume.
- We also start a coroutine (code executed in parallel) to reset the button text after 4s)