Plugin: AzSpeech

About the project

A plugin integrating Azure Speech Cognitive Services to Unreal Engine with simple functions which can do these asynchronous tasks:

Voice-To-Text (Convert a speech into a string)
Text-To-Voice (Convert a string into a speech)
Text-To-Wav (Convert a string into a .wav audio file)
Wav-To-Text (Convert a .wav audio file into a string)
Text-To-Stream (Convert a string into a audio data stream)

And helper functions:

Runtime USoundWave importer via Audio File
Runtime USoundWave importer via Audio Data Stream

Links

Marketplace: AzSpeech - Text and Voice
Forum: [FREE] AzSpeech plugin: Async Text-to-Voice and Voice-to-Text with Microsoft Azure
Microsoft Documentation: Speech Service Documentation

Documentation

Installation

Just clone this repository or download the .zip file in the latest version and drop the files inside the 'Plugins/' folder of your project folder. Note that if your project doesn't have a 'Plugins' folder, you can create one.

Blueprint Usage

You have these asynchronous functions to manage all the workaround:

Voice to Text Async: Will convert your speech to string. Not that this function will get the speech by your default sound input device.
1.1. Parameters: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

2. Text to Voice Async: Will convert the specified text to a sound that will be played by the default sound output device.
2.1. Text to Convert: The text that will be converted to audio;
2.2. Voice Name: Voice code that will represent the type of voice that will "read" the converted text. You can see all names here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech;
2.3. Parameters: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

3. Text to WAV Async: Will convert the specified text to a .wav file.
3.1. Text to Convert: The text that will be converted to audio;
3.2. File Path: Output path to save the audio file;
3.3. File Name: Output file name;
3.4. Voice Name: Voice code that will represent the type of voice that will "read" the converted text. You can see all names here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech;
3.5. Parameters: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

4. WAV to Text Async: Will convert the specified .wav file into a string.
4.1. File Path: Input path of the audio file;
4.2. File Name: Input file name;
4.3. Parameters: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

5. Text to Stream Async: Will convert the specified text to a data stream.
5.1. Text to Convert: The text that will be converted to stream;
5.2. Voice Name: Voice code that will represent the type of voice that will "read" the converted text. You can see all names here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech;
5.3. Parameters: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

6. AzSpeechData: Represents Microsoft Azure parameters to connect to the service and perform the tasks;
6.1. API Access Key: It's your Speech Service API Access Key from your Microsoft Azure Portal - Speech Service Panel;
6.2. Region ID: Speech Service Region from your Microsoft Azure Portal. You can see all regions here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions;
6.3. Language ID: Language to apply lozalization settings. You can see all IDs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text;

7. Convert File To Sound Wave: Will load a specified audio file and transform into a transient USoundWave.
7.1. File Path: Input path of the audio file;
7.2. File Name: Input file name;

8. Convert Stream To Sound Wave: Will convert the specified data stream into a transient USoundWave.
8.1 Raw Data: Data stream of a specified audio.

C++ Usage

You need to include the module "AzSpeech" inside your .Build.cs class to be allowed to call AzSpeech functions from C++.

Text-to-Voice
#include "AzSpeech/TextToVoiceAsync.h"

void AMyExampleActor::Example()
{
  FAzSpeechData MyNewData;
  MyNewData.LanguageID = "LanguageID";
  MyNewData.RegionID = "RegionID";
  MyNewData.APIAccessKey = "APIAccessKey";

  UTextToVoiceAsync* MyTask = UTextToVoiceAsync::TextToVoiceAsync(this, "Text to Convert",
  	                                                                "Voice Name", MyNewData);

  MyTask->TaskCompleted.AddDynamic(this, &AMyExampleActor::ExampleCallback);

  MyTask->Activate();
}

void AMyExampleActor::ExampleCallback(const bool TaskResult)
{
}
UTextToVoiceAsync::TextToVoiceAsync(WorldContextObject, TextToConvert, VoiceName, FAzSpeechData): Will convert the specified text to a sound that will be played by the default sound output device.
1.1. WorldContextObject: Task Owner.
1.2. TextToConvert: The text that will be converted to audio;
1.3. VoiceName: Voice code that will represent the type of voice that will "read" the converted text. You can see all names here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech;
1.4. FAzSpeechData: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

FAzSpeechData: Represents Microsoft Azure parameters to connect to the service and perform the tasks;
2.1. APIAccessKey: It's your Speech Service API Access Key from your Microsoft Azure Portal - Speech Service Panel;
2.2. RegionID: Speech Service Region from your Microsoft Azure Portal. You can see all regions here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions;
2.3. LanguageID: Language to apply lozalization settings. You can see all IDs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text;

The TextToVoiceAsync function return a delegate to manage task status which allow you to bind a function to it's delegate to handle task's completion call.

Voice-to-Text
#include "AzSpeech/VoiceToTextAsync.h"

void AMyExampleActor::Example()
{
  FAzSpeechData MyNewData;
  MyNewData.LanguageID = "LanguageID";
  MyNewData.RegionID = "RegionID";
  MyNewData.APIAccessKey = "APIAccessKey";

  UVoiceToTextAsync* MyTask = UVoiceToTextAsync::VoiceToTextAsync(this, MyNewData);

  MyTask->TaskCompleted.AddDynamic(this, &AMyExampleActor::ExampleCallback);

  MyTask->Activate();
}

void AMyExampleActor::ExampleCallback(const FString& TaskResult)
{
}
UVoiceToTextAsync::VoiceToTextAsync(WorldContextObject, FAzSpeechData): Will convert the specified text to a sound that will be played by the default sound output device.
1.1. WorldContextObject: Task Owner.
1.2. FAzSpeechData: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

FAzSpeechData: Represents Microsoft Azure parameters to connect to the service and perform the tasks;
2.1. APIAccessKey: It's your Speech Service API Access Key from your Microsoft Azure Portal - Speech Service Panel;
2.2. RegionID: Speech Service Region from your Microsoft Azure Portal. You can see all regions here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions;
2.3. LanguageID: Language to apply lozalization settings. You can see all IDs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text;

The VoiceToTextAsync function return a delegate to manage task status which allow you to bind a function to it's delegate to handle task's completion call.

Text-to-WAV
#include "AzSpeech/TextToWavAsync.h"

void AMyExampleActor::Example()
{
  FAzSpeechData MyNewData;
  MyNewData.LanguageID = "LanguageID";
  MyNewData.RegionID = "RegionID";
  MyNewData.APIAccessKey = "APIAccessKey";

  UTextToWavAsync* MyTask = UTextToWavAsync::TextToWavAsync(this, "Text to Convert",
  	                                                          "File Path", "File Name",
  	                                                          "Voice Name", MyNewData);

  MyTask->TaskCompleted.AddDynamic(this, &AMyExampleActor::ExampleCallback);

  MyTask->Activate();
}

void AMyExampleActor::ExampleCallback(const bool TaskResult)
{
}
Text to WAV Async: Will convert the specified text to a .wav file.
1.1. WorldContextObject: Task Owner.
1.2. TextToConvert: The text that will be converted to audio;
1.3. FilePath: Output path to save the audio file;
1.4. FileName: Output file name;
1.5. VoiceName: Voice code that will represent the type of voice that will "read" the converted text. You can see all names here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech;
1.6. FAzSpeechData: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

FAzSpeechData: Represents Microsoft Azure parameters to connect to the service and perform the tasks;
2.1. APIAccessKey: It's your Speech Service API Access Key from your Microsoft Azure Portal - Speech Service Panel;
2.2. RegionID: Speech Service Region from your Microsoft Azure Portal. You can see all regions here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions;
2.3. LanguageID: Language to apply lozalization settings. You can see all IDs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text;

The TextToWavAsync function return a delegate to manage task status which allow you to bind a function to it's delegate to handle task's completion call.

WAV-to-Text
#include "AzSpeech/WavToTextAsync.h"

void AMyExampleActor::Example()
{
  FAzSpeechData MyNewData;
  MyNewData.LanguageID = "LanguageID";
  MyNewData.RegionID = "RegionID";
  MyNewData.APIAccessKey = "APIAccessKey";

  UWavToTextAsync* MyTask = UWavToTextAsync::WavToTextAsync(this, "FilePath", "FileName", MyNewData);

  MyTask->TaskCompleted.AddDynamic(this, &AMyExampleActor::ExampleCallback);

  MyTask->Activate();
}

void AMyExampleActor::ExampleCallback(const FString& TaskResult)
{
}
UWavToTextAsync::WavToTextAsync(WorldContextObject, FilePath, FileName, FAzSpeechData): Will convert the specified .wav file into a string.
1.1. WorldContextObject: Task Owner.
1.2. FilePath: Input path of the audio file.
1.3. FileName: Input file name.
1.4. FAzSpeechData: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

FAzSpeechData: Represents Microsoft Azure parameters to connect to the service and perform the tasks;
2.1. APIAccessKey: It's your Speech Service API Access Key from your Microsoft Azure Portal - Speech Service Panel;
2.2. RegionID: Speech Service Region from your Microsoft Azure Portal. You can see all regions here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions;
2.3. LanguageID: Language to apply lozalization settings. You can see all IDs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text;

The WavToTextAsync function return a delegate to manage task status which allow you to bind a function to it's delegate to handle task's completion call.

Text-to-Stream
#include "AzSpeech/TextToStreamAsync.h"

void AMyExampleActor::Example()
{
  FAzSpeechData MyNewData;
  MyNewData.LanguageID = "LanguageID";
  MyNewData.RegionID = "RegionID";
  MyNewData.APIAccessKey = "APIAccessKey";

  UTextToStreamAsync* MyTask = UTextToStreamAsync::TextToStreamAsync(this, "TextToConvert", "VoiceName", MyNewData);

  MyTask->TaskCompleted.AddDynamic(this, &AMyExampleActor::ExampleCallback);

  MyTask->Activate();
}

void AMyExampleActor::ExampleCallback(const TArray<uint8>& TaskResult)
{
}
UTextToStreamAsync::TextToStreamAsync(WorldContextObject, TextToConvert, VoiceName, FAzSpeechData): Will convert the specified text file into a audio data stream.
1.1. WorldContextObject: Task Owner.
1.2. TextToConvert: The text that will be converted to data stream;
1.3. VoiceName: Voice code that will represent the type of voice that will "read" the converted text. You can see all names here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech;
1.4. FAzSpeechData: Microsoft Azure parameters to connect to the service and perform the tasks. The structure AzSpeechData represents this input;

FAzSpeechData: Represents Microsoft Azure parameters to connect to the service and perform the tasks;
2.1. APIAccessKey: It's your Speech Service API Access Key from your Microsoft Azure Portal - Speech Service Panel;
2.2. RegionID: Speech Service Region from your Microsoft Azure Portal. You can see all regions here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions;
2.3. LanguageID: Language to apply lozalization settings. You can see all IDs here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text;

The TextToStreamAsync function return a delegate to manage task status which allow you to bind a function to it's delegate to handle task's completion call.

More informations:

How to get the Speech Service API Access Key and Region ID:

See the official Microsoft documentation for Azure Cognitive Services: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/

sjtuerx / UEAzSpeech

Plugin: AzSpeech

About the project

Links

Documentation

Installation

Blueprint Usage

C++ Usage

Text-to-Voice

Voice-to-Text

Text-to-WAV

WAV-to-Text

Text-to-Stream

More informations:

How to get the Speech Service API Access Key and Region ID:

About

Languages