edge-whisper

https://edge-ai.yomo.run

Realtime transcribe by running whisper model on geo-distributed cloud.

This showcase demonstrates real-time speech-to-text transcription using the Whisper model. The model is deployed across geographically distributed cloud infrastructure to ensure optimal performance and low latency for users around the world.

Users are automatically directed to the most suitable backend server based on their location. To determine your assigned backend and hardware configuration, simply ping edgeai.yomo.dev and check the returned IP address. Here's an overview of the available backends:

3.66.190.18: run Whisper.cpp inference on AWS Graviton 3 arm-based processor.
20.9.141.176: run Whisper.cpp inference on Azure Ampere arm-based processor.
43.131.34.253: run Whisper inference on NVidia Tesla T4.

By leveraging this geographically distributed architecture, this showcase delivers fast, accurate, and reliable speech transcription for users globally.

Self-hosting

To deploy this real-time speech transcription system on your own infrastructure, follow these steps:

Start the frontend: Runpnpm run dev to launch the frontend application, which provides the interface for simultaneous interpretation.
Choose your backend: Backends are located in the ./backends/ directory and are built using YoMo. Each backend targets a specific type of AI infrastructure.
Select and run the appropriate backend script:

for Arm based processors, run backends/whisper_cpp_arm_server.py to load whisper.cpp model.
for NVidia GPUs, run backends/whisper_nvidia_t4_server.py to load whisper model.

Please note: These instructions assume you have the necessary dependencies like Whisper, Whisper.cpp and YoMo Framework installed. Refer to the project documentation for further details.

About

Realtime transcribe by running whisper model on geo-distributed cloud.

https://edge-ai.yomo.run

Apache License 2.0

Languages

Language:TypeScript 56.5%Language:JavaScript 16.0%Language:Python 13.2%Language:Go 10.4%Language:CSS 4.0%