AI NPC Unity Template Scene

Please find the updated version of this repository here, featuring more modern SDKs and a modular system design.

This repository contains a template scene for interfacing with an AI-based NPC using OpenAI API, Azure Voice API, Google Cloud Speech to Text, and Oculus Lip Sync. The project uses Unity 2021.3.x.

The framework allows for easy integration with YOLO-NAS, enabling the NPC to stream virtual camera frames and receive responses from a YOLO-NAS server instance, whether local or remote. The response includes all identified objects along with their confidence scores. Additionally, it leverages the Ready Player Me avatar, providing a reference for mapping Oculus lip sync to avatar models.

Usage

Once you've set up the configuration files, you can run the main scenes. After speaking to the NPC, it will respond within seconds. In the NPCVoiceSimple-MetaAssets scene, the example models provided by Meta for Lip Sync will respond to user inquiries. In the NPCVoiceVisionSimple scene, the NPC can be configured to see its environment using YOLO-NAS.

Interfacing with OpenAI API

The ChatbotController game object contains the main script for interfacing with the OpenAI API. To adjust the input parameters for each OpenAI API request, you can modify this component.

Selecting the Default Model: You can choose the default model by selecting an option from the model dropdown in the ChatbotController inspector.
Assigning a Personality Profile: To assign a personality profile, specify a name and personality description in the ChatbotController component. To select a personality from the available options, write the name of the personality in the "Set Personality Profile Name" inspector value.

Working with YOLO-NAS and Camera Streamer

The Camera Streamer component allows you to specify the endpoint of the YOLO-NAS server. This component receives a JSON object from the server, listing all identified objects. To process this data, you can explore the ReceiveData thread.

Make sure to configure the necessary settings and explore the various components to customize and enhance the NPC's behavior and interactions.

For the YOLO-NAS server instance, please check out this repository.

Group Based NPCs [Experimental]

Check out the NPCGroup scene to setup multiple NPCs to chat with one another. One NPC is design to invoke the conversation about collision with another NPC (or player). USing the NPCGroupCommunicator component, simply assign the NPC group you'd like and they'll use event systems to communicate with one another once they finish speaking.

Roadmap:

Integrate RageAgainstThePixel's Open AI library (https://github.com/RageAgainstThePixel/com.openai.unity).
Add support for Eleven Labs' text-to-voice library (https://github.com/RageAgainstThePixel/com.rest.elevenlabs).
Make use of YOLO-NAS results in OpenAI conversation requests.
Add example scene with Avaturn Avatar Models (https://avaturn.me/).
Further test group NPC conversations.
Implement a dynamic way of invoking pre-downloaded animations.
Update Meta Movement SDK to support URP and resolve pink assets

These additions to the project will enhance its capabilities and expand the available libraries and features over time.

Feel free to contribute to the project!

Setup

Before running the scene, you'll need to set up the following services and create a configuration file for the application to read at runtime:

Review the setup instructions for the following repositories that are used in this project:

Configuration File

In the StreamingAssets folder, create a services_config.json file with the following template, and replace the placeholder values with your own API keys and region information:

{
"OpenAI_APIKey": "your_openai_api_key",
"AzureVoiceSubscriptionKey": "your_azure_voice_subscription_key",
"AzureVoiceRegion": "your_azure_voice_region"
}

Create a gcp_credentials.json file for Google Cloud runtime to read configuration properties from using the following template:

{
"type": "service_account",
"project_id": "YOUR PROJECT ID",
"private_key_id": "YOUR PRIVATE KEY ID",
"private_key": "YOUR PRIVATE KEY",
"client_email": "YOUR CLIENT EMAIL",
"client_id": "YOUR CLIENT ID",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "YOUR CLIENT CERT URL"
}

Known Issues

Google Cloud's speech to text has a time limit of 5 minutes for stream requests. The developer must add the ability to start/stop/restart the stream.

Sometimes on the first time opening the project, you might get the following error message: "Multiple precompiled assemblies with the same name Newtonsoft.Json.dll included on the current platform." This error occurs because Unity enforces a Newtonsoft import due to its services.core dependency. To resolve this error:

Close the project
Open the file explorer and navigate to \Library\PackageCache\com.oshoham.unity-google-cloud-streaming-speech-to-text@0.1.8\Plugins
Delete the Newtonsoft.dll file
Reopen the project and hit "Ignore"
Delete Newtonsoft.dll again from the same location
The import should now complete.

Notes

Models Used: This project currently uses the TextDavinciV3 and the ChatGpt3_5Turbo. It has support for GPT4 (in the code), though the demo scenes do not use.

In addition to the APIs and packages mentioned above, this project also uses the Meta Movement SDK. More information on this SDK can be found in its GitHub repository at https://github.com/oculus-samples/Unity-Movement.

References

About

This Unity project is an AI-based chatbot interface that leverages OpenAI API, Azure Voice API, Google Cloud Speech to Text, and Oculus Lip Sync. Upon initialization of the scene, users can ask the chatbot questions, which it answers within seconds. The chatbot's responses can be heard through an Azure voice generator.

Languages

Language:C# 94.6%Language:ShaderLab 3.5%Language:HTML 1.7%Language:HLSL 0.3%