xiaohou's repositories

KWS_Max-pooling_RHE

Mining effective negative training samples for keyword spotting (PyTorch)

RPN_KWS

Region proposal network based small-footprint keyword spotting (Pytorch)

Language:PythonLicense:MITStargazers:51Issues:3Issues:3

Audiomer-PyTorch

A Convolutional Transformer for Keyword Spotting

Language:PythonStargazers:3Issues:1Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:1Issues:0Issues:0

CosyVoice

LLM based TTS model, providing inference/training/deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:MITStargazers:1Issues:0Issues:0

ChatLaw

中文法律大模型

License:AGPL-3.0Stargazers:0Issues:0Issues:0

chinese_speech_pretrain

chinese speech pretrained models

Language:ShellStargazers:0Issues:1Issues:0

ChineseLyrics

10W首中文歌词数据库

Stargazers:0Issues:0Issues:0

e2e_lfmmi

E2E system with LF-MMI; word N-gram for Mandarin

Language:PythonStargazers:0Issues:1Issues:0

ego2022

JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS

Language:MATLABStargazers:0Issues:1Issues:0

espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

License:GPL-3.0Stargazers:0Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

FunASR

A Fundamental End-to-End Speech Recognition Toolkit

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Language:CudaLicense:NOASSERTIONStargazers:0Issues:1Issues:0

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

License:MITStargazers:0Issues:0Issues:0

NeMo

NeMo: a toolkit for conversational AI

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:1Issues:0

phonemizer

Simple text to phones converter for multiple languages

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

TNN

TNN:由腾讯优图实验室打造,移动端高性能、轻量级推断框架,同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN框架在原有Rapidnet、ncnn框架的基础上进一步加强了移动端设备的支持以及性能优化,同时也借鉴了业界主流开源框架高性能和良好拓展性的优点。目前TNN已经在手Q、微视、P图等应用中落地,欢迎大家参与协同共建,促进TNN推断框架进一步完善。

Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

License:Apache-2.0Stargazers:0Issues:0Issues:0

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

wenet-kws

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:Jupyter NotebookLicense:MITStargazers:0Issues:1Issues:0

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:CLicense:MITStargazers:0Issues:1Issues:0