Macoron / whisper.unity

Running speech to text model (whisper.cpp) in Unity3d on your local machine.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What would it take to have this run on Xbox consoles?

dustinpb97 opened this issue · comments

What would it take to build the libraries to have this run on Xbox consoles? Would that be building the source Whisper repo for Xbox or this Unity bindings repo?

I'm going to update this with info as I try to figure this out.

Some prerequisites for xbox development:
Unity Editor:

  • Unity 2022.3.34.f1 Link
  • Unity Game Core Series (Scarlett) Add-on (2022.3.34.f1): Link
  • Unity Game Core Xbox One Add-on (2022.3.34.f1): Link
  • Game Core Render Pipeline package (Add as tarball from Unity Package Manager)

Requirements:

  • Windows 10 64-bit (Version 1709 or higher) or Windows 11.
  • Visual Studio 2019 (16.9 or later), or Visual Studio 2022, Professional or Enterprise. The "Game development with C++" workload is required. Visual Studio 2019 version 16.11 or the latest Visual Studio 2022 update are recommended.
  • Windows 10 SDK version 22000 or later. The Windows 10 SDK can be installed in two ways: by selecting "Windows 10 SDK" in the optional components of the Visual Studio Installer or from the Windows 10 SDK download page
  • Make sure Visual Studio and the workloads and components listed below are installed before installing the GDKX
  • Microsoft October 2023 GDKX Update 4 (10.0.25398.1940) Link
  • Note the explicit GDK Version stated in the system requirements is required. Using a newer "Update Version" of the GDK will not work.
  • Console operating system updated to the latest Microsoft Console Recovery.

Visual Studio 2022 Link (17.10.3 works as of 6/2024)

Workloads to include with Visual Studio

  • .NET Desktop Development
  • Desktop Development with C++
  • Game Development with C++
  • Game Development with Unity

Individual Components added
(Note: This specific version of VS needs the MSVC v14.34-17.4. It may not need it in future updates of VS. More Info)

  • MSVC v 143 - VS 2022 C++ x64/x86 build tools (v14.34-17.4)
  • Windows 11 SDK 10.0.22621.0

I've modified the build_cpp.bat file to support xbox targeted builds. Feel free to change the cmake and msbuild paths back to your path variable if you wish.

build_cpp.txt

I've changed the build configuration settings in visual studio, but that didn't resolve my error.
As of now, my latest error is

`"D:\Unity Projects\Whisper\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\Unity Projects\Whisper\build\whisper.vcxproj" (default target) (3) ->
(Link target) ->
ggml.obj : error LNK2019: unresolved external symbol __imp_strdup referenced in function gguf_add_tensor [D:\Unity Pr
ojects\Whisper\build\whisper.vcxproj]

D:\Unity Projects\Whisper\build\bin\Release\whisper.dll : fatal error LNK1120: 1 unresolved externals [D:\Unity Proje
cts\Whisper\build\whisper.vcxproj]`

After updating references of "strdup" to "_strdup" in ggml.c I was able to successfully build the whisper dlls for xbox. I'm still working on integrating this with the rest of the project and have yet to verify if it works.

@dustinpb97 looks very interesting. Thank you for sharing your research. Looking forward to hear more about your progress.

I was having crashes when attempting to start a stream on Xbox One. The issue was building the source whisper with AVX2 instead of AVX1. After rebuilding with AVX1 we were able to run our project on Xbox One and the voice streaming didn't crash the Xbox One application. Because of this though, whisper performance is really slow. That's all I have for today, but tomorrow will look into improving performance by tweaking streaming and recording settings.

So I was able to adjust the streaming settings to increase performance. It streams great in the unity editor, but when running on Xbox, it takes 5+ minutes to output a text result. It does work though.

That's unfortunate. Here are some ideas that you can try:

  1. Avoid using Streaming mode and run standard Microphone transcription. Streaming has overhead and especially fragile on such weak hardware.
  2. Make sure that you compile both whisper.cpp and Unity project in release mode
  3. Try to use quantized models, like ggml-tiny.en-q5_1 or ggml-tiny.en-q8_0
  4. Use "Speed Up" setting from WhisperManager
  5. Play around with number of threads allocated to whisper, setting it to one or two.

In theory, it should be possible. For example original author managed to run it almost real time on a Raspberry Pi.

I've done 1, 2, and 3 and that has greatly increased performance. It returns a result in about 30 seconds now. I tried enabling the "Speed Up" setting but it was returning an error code of -1. How would I adjust the number of threads allocated to whisper?

I tried enabling the "Speed Up" setting but it was returning an error code of -1.

It looks that authors silently depreciated this feature. The speed up fail on any input. Will probably need to remove it from WhisperManager as well.

How would I adjust the number of threads allocated to whisper?

This is ThreadsCount parameter in WhisperParams. Easiest way to change it somewhere here:

public static WhisperParams GetDefaultParams(WhisperSamplingStrategy strategy =
WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY)
{
LogUtils.Verbose($"Requesting default Whisper params for strategy {strategy}...");
var nativeParams = WhisperNative.whisper_full_default_params(strategy);
LogUtils.Verbose("Default params generated!");
var param = new WhisperParams(nativeParams)
{
// usually don't need C++ output log in Unity
PrintProgress = false,
PrintRealtime = false,
PrintTimestamps = false
};
// for some reason on android one thread works
// 10x faster than multithreading
#if UNITY_ANDROID && !UNITY_EDITOR
param.ThreadsCount = 1;
#endif
return param;
}

Wanted to report back that I was able to achieve results in less than 5 seconds, which is ideal. I reduced the audio context (Audio Ctx in WhisperManager inspector) size to 256. We now have Whisper running on the Xbox one :)

I'd like to submit a pull request to share the libwhisper_xbox.dll, but I'm not sure how to do that. I tried creating one but it said forbidden.

Wanted to report back that I was able to achieve results in less than 5 seconds, which is ideal. I reduced the audio context (Audio Ctx in WhisperManager inspector) size to 256. We now have Whisper running on the Xbox one :)

Amazing job! How is the quality? Is it usable for your use case? I hardly recommend you to post it on whisper.cpp discussions, guys there will be glad to hear that.

I'd like to submit a pull request to share the libwhisper_xbox.dll, but I'm not sure how to do that. I tried creating one but it said forbidden.

That's weird. You should be able to do that. You would need to create fork and make your changes in a separate branch. You can send me the link to the branch, I can do PR myself.