whaozl

followers

following

stars

Shanghai,China

blog.csdn.net/zhulinniao

Anjos's repositories

whisper-plus

WhisperPlus: Advancing Speech-to-Text Processing 🚀

Language:PythonApache-2.0100

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonApache-2.0000

attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Language:PythonMIT000

bertTokenizer

java implementation of Bert Tokenizer, support output onnx tensor for onnx model inference

Language:Java000

CapsWriter-Offline

CapsWriter 简陋但好用的离线版，一个 PC 端的语音输入工具

Language:Python000

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.0000

faster-whisper

Faster Whisper transcription with CTranslate2

MIT000

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonApache-2.0000

icefall

Language:PythonApache-2.0000

k2-v2.0-pre-branch-HLG

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Language:CudaApache-2.0000

kaldi-native-fbank

Kaldi-compatible online fbank extractor without external dependencies

Language:C++Apache-2.0000

Leaderboard

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Language:Python000

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonApache-2.0000

moshi

Apache-2.0000

RealSI

RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios

CC-BY-4.0000

Recorder

html5 js 录音 mp3 wav ogg webm amr 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

Language:JavaScriptMIT000

riva-asrlib-decoder

Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva

Language:Python000

SD-Eval

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Language:PythonApache-2.0000

sherpa-onnx

Real-time speech recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin

Language:C++Apache-2.0000

silero-vad5

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonMIT000

speech-to-speech

Language:Python000

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT000

TeleSpeech-ASR

Language:Python000

TMSpeech

腾讯会议摸鱼工具

Language:C#MIT000

west

We Speech Transcript based on LLM, in 300 lines of code.

Language:PythonApache-2.0000

Whisper-Finetune

微调Whisper语音识别模型，支持无时间戳数据训练，有时间戳数据训练、无语音数据训练。加速推理，支持Web部署、Windows桌面部署和Android部署

Language:CApache-2.0000

whisper-jni

A JNI wrapper for using whisper.cpp, allows to transcribe speech to text in Java.

Language:JavaApache-2.0000

whisper-medusa

Whisper with Medusa heads

Language:PythonMIT000

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:C++MIT000

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD-2-Clause000