jingyonghou

followers

following

stars

xiaohou's repositories

KWS_Max-pooling_RHE

Mining effective negative training samples for keyword spotting (PyTorch)

Language:Python55 3 1

RPN_KWS

Region proposal network based small-footprint keyword spotting (Pytorch)

Language:PythonMIT51 3 3

Audiomer-PyTorch

A Convolutional Transformer for Keyword Spotting

Language:Python3 10

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT100

CosyVoice

LLM based TTS model, providing inference/training/deployment full-stack ability.

Language:PythonApache-2.0100

SenseVoice

Multilingual Voice Understanding Model

Language:PythonMIT100

ChatLaw

中文法律大模型

AGPL-3.0000

chinese_speech_pretrain

chinese speech pretrained models

Language:Shell010

ChineseLyrics

10W首中文歌词数据库

000

e2e_lfmmi

E2E system with LF-MMI; word N-gram for Mandarin

Language:Python010

ego2022

JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS

Language:MATLAB010

espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

GPL-3.0000

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT010

FunASR

A Fundamental End-to-End Speech Recognition Toolkit

Language:PythonMIT010

hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Language:PythonMIT010

k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Language:CudaNOASSERTION010

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

MIT000

NeMo

NeMo: a toolkit for conversational AI

Language:Jupyter NotebookApache-2.0010

phonemizer

Simple text to phones converter for multiple languages

Language:PythonGPL-3.0000

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION000

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonApache-2.0010

THE-2020-PERSONALIZED-VOICE-TRIGGER-CHALLENGE-BASELINE-SYSTEM

Language:Shell010

TNN

TNN：由腾讯优图实验室打造，移动端高性能、轻量级推断框架，同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN框架在原有Rapidnet、ncnn框架的基础上进一步加强了移动端设备的支持以及性能优化，同时也借鉴了业界主流开源框架高性能和良好拓展性的优点。目前TNN已经在手Q、微视、P图等应用中落地，欢迎大家参与协同共建，促进TNN推断框架进一步完善。

Language:C++NOASSERTION010

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Apache-2.0000

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonApache-2.0010

wenet-kws

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Language:PythonApache-2.0010

wenet_trt8

Language:PythonApache-2.0010

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonApache-2.0010

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:Jupyter NotebookMIT010

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:CMIT010