Jean Du's repositories
cnn-lstm-based-malware-document-classification
use cnn/lstm and ensembling model to classify different documents, according to the api sequences each document calls.
buzz
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Coqui-TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
espnet
End-to-End Speech Processing Toolkit
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FunASR
A Fundamental End-to-End Speech Recognition Toolkit
k2
FSA/FST algorithms, differentiable, with PyTorch compatibility.
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
CosyVoice
LLM based TTS model, providing inference/training/deployment full-stack ability.
modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
radtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
riva-asrlib-decoder
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
SenseVoice
Multilingual Voice Understanding Model
WenetSpeech
A 10000+ hours dataset for Chinese speech recognition
WeTextProcessing
Text Normalization & Inverse Text Normalization
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
whisper.cpp
Port of OpenAI's Whisper model in C/C++
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)