k2-fsa/sherpa-onnx

asr onnx windows linux macos cpp android ios raspberry-pi aarch64 arm32 csharp dotnet mfc speech-to-text text-to-speech vits openkylin risc-v

Supported functions

Speech recognition	Speech synthesis	Speaker verification	Speaker identification
✔️	✔️	✔️	✔️

Spoken Language identification	Audio tagging	Voice activity detection	Keyword spotting
✔️	✔️	✔️	✔️

Supported platforms

Architecture	Android	iOS	Windows	macOS	linux
x64	✔️		✔️	✔️	✔️
x86	✔️		✔️
arm64	✔️	✔️	✔️	✔️	✔️
arm32	✔️				✔️
riscv64					✔️

Supported programming languages

C++	C	Python	C#	Java	JavaScript	Kotlin	Swift	Go	Dart
✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
Text-to-speech (i.e., TTS)
Speaker identification
Speaker verification
Spoken language identification
Audio tagging
VAD (e.g., silero-vad)
Keyword spotting

on the following platforms and operating systems:

x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
Linux, macOS, Windows, openKylin
Android, WearOS
iOS
NodeJS
WebAssembly
Raspberry Pi
RV1126
LicheePi4A
VisionFive 2
旭日X3派
etc

with the following APIs

C++, C, Python, Go, C#
Java, Kotlin, JavaScript
Swift
Dart

Links for pre-built Android APKs

Description	URL	**用户
Streaming speech recognition	Address	点此
Text-to-speech	Address	点此
Voice activity detection (VAD)	Address	点此
VAD + non-streaming speech recognition	Address	点此
Two-pass speech recognition	Address	点此
Audio tagging	Address	点此
Audio tagging (WearOS)	Address	点此
Speaker identification	Address	点此
Spoken language identification	Address	点此
Keyword spotting	Address	点此

Links for pre-built Flutter APPs

Description	URL	**用户
Streaming speech recognition	Address	点此

Links for pre-trained models

Description	URL
Speech recognition (speech to text, ASR)	Address
Text-to-speech (TTS)	Address
VAD	Address
Keyword spotting	Address
Audio tagging	Address
Speaker identification (Speaker ID)	Address
Spoken language identification (Language ID)	See multi-lingual Whisper ASR models from Speech recognition
Punctuation	Address

Useful links

Documentation: https://k2-fsa.github.io/sherpa/onnx/
Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

How to reach us

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.

About

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter

https://k2-fsa.github.io/sherpa/onnx/index.html

asr onnx windows linux macos cpp android ios raspberry-pi aarch64 arm32 csharp dotnet mfc speech-to-text text-to-speech vits openkylin risc-v

Apache License 2.0

Languages

Language:C++ 42.0%Language:Python 15.3%Language:Kotlin 6.1%Language:CMake 5.8%Language:Shell 5.7%Language:JavaScript 5.4%Language:Java 4.1%Language:Dart 4.0%Language:C# 3.5%Language:Swift 2.9%Language:C 2.8%Language:Go 1.9%Language:HTML 0.4%Language:Makefile 0.1%Language:Ruby 0.1%Language:Objective-C 0.0%