duj12

followers

following

stars

Xmov.ai

Shanghai

Jean Du's repositories

ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

Language:HTML43 2 3

cnn-lstm-based-malware-document-classification

use cnn/lstm and ensembling model to classify different documents, according to the api sequences each document calls.

Language:PythonMIT12 10

ss-vad

self-supervised vad

Language:PythonMIT7 20

wekws

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Language:PythonApache-2.0600

GPTSoVITS

Language:PythonMIT5 10

kws_demo

KWS demo based on CTC prefix beam search.

Language:Python5 1 4

OpenVoice

Instant voice cloning by MyShell

Language:PythonMIT100

buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Language:PythonMIT000

Coqui-TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonMPL-2.0000

duj12

Config files for my GitHub profile.

020

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.0000

espnet

End-to-End Speech Processing Toolkit

Language:PythonApache-2.0000

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT000

FunASR

A Fundamental End-to-End Speech Recognition Toolkit

Language:PythonNOASSERTION000

icefall

Language:PythonApache-2.0000

k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Language:CudaApache-2.0000

s3prl

Language:PythonApache-2.0010

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:C++Apache-2.0000

CosyVoice

LLM based TTS model, providing inference/training/deployment full-stack ability.

Language:PythonApache-2.0000

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language:PythonApache-2.0000

radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.

Language:RoffMIT000

riva-asrlib-decoder

Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva

Language:Python000

SenseVoice

Multilingual Voice Understanding Model

Language:PythonNOASSERTION000

vad_asr

Language:Python010

WenetSpeech

A 10000+ hours dataset for Chinese speech recognition

Language:ShellApache-2.0000

WeTextProcessing

Text Normalization & Inverse Text Normalization

Language:PythonApache-2.0000

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonApache-2.0000

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT000

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:CMIT000

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language:PythonBSD-4-Clause000