Emmanuel Schmidbauer's repositories
websocket-audio-stream
pyaudio & websocket to stream real-time audio to speakers
voicefixer
General Speech Restoration
acoustic-model
Acoustic models for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
CoMoSpeech
one-step diffusion based speech synthesis
faster-whisper
Faster Whisper ASR transcription with CTranslate2
FlexFlow
A distributed deep learning framework.
freeswitch
FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. From a Raspberry PI to a multi-core server, FreeSWITCH can unlock the telecommunications potential of any device.
greenswitch
Battle proven FreeSWITCH Event Socket Protocol client implementation with Gevent
kamailio
Kamailio - The Open Source SIP Server
metaseq
Repo for external large-scale work
peerless
Peerless Animate API
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Auralis
A Fast TTS Engine
Kokoro-FastAPI
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/NVIDIA GPU support, queue handling, and auto-stitching
MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
mod_audio_stream
FreeSWITCH module to stream audio to websocket and receive response
mod_vad
a voice activity detection module for freeswitch.
NeMo-text-processing
NeMo text processing for ASR and TTS
pkg-kamailio-docker
Docker files to easily build Kamailio on different Debian/Ubuntu releases
RAD-MMM
A TTS model that makes a speaker speak new languages
RVC_CLI
RVC CLI enables seamless interaction with Retrieval-based Voice Conversion through commands or HTTP requests.
sherpa-onnx
Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
whisper-cpp-server
whisper-cpp-server
whisperd
Unified API for various whisper implementations
WhisperS2T
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
X-E-Speech-code
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion