shengzhang0222

shengzhang0222's starred repositories

ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

Language:PythonMIT1696600

KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Language:PythonMIT47400

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonApache-2.0399000

wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Language:PythonApache-2.061600

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookMIT565300

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT2036500

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Language:PythonApache-2.036100

mfa_conformer

Language:Python13000

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonMIT124600

contentvec

speech self-supervised representations

Language:PythonMIT43900

so-vits-svc

SoftVC VITS Singing Voice Conversion

Language:PythonAGPL-3.02502900

voice-activity-detection

Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

Language:PythonMIT14500

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION531700

WeTextProcessing

Text Normalization & Inverse Text Normalization

Language:PythonApache-2.043700

ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Language:PythonMIT56600

Leaderboard

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Language:Python42000

One-Shot-Voice-Cloning

:relaxed: One Shot Voice Cloning base on Unet-TTS

Language:Jupyter Notebook23500

ChineseTtsTflite

Android Chinese TTS Engine Base On Tensorflow TTS , use for TfLite Models Test。安卓离线中文TTS引擎，在TensorflowTTS基础上开发，用于TfLite模型测试。

Language:JavaApache-2.028700

phkit

phoneme toolkit. 好用的音素处理工具箱，包含中文音素、英文音素、文本转拼音、文本正则化等模块。

Language:PythonMIT7500

chinese_text_normalization

Chinese text normalization for speech processing

Language:PythonMIT61200

ParaGen

ParaGen is a PyTorch deep learning framework for parallel sequence generation.

Language:PythonNOASSERTION18600

TensorFlowTTS_chinese

chinese tts

Language:Jupyter NotebookApache-2.07500

Speech-Transformer-tf2.0

transformer for ASR-systerm (via tensorflow2.0)

Language:Python11300

score-ensembles-based-SVM

Combine many organs from a plant to predict their species

Language:Jupyter Notebook2100

Speaker_Verification_Tencent

Deep Discriminative Embeddings for Duration Robust Speaker Verification

Language:PythonMIT1900

antispoofing-features

Code for the paper "Bag of features for voice anti-spoofing"

Language:PythonMIT1300

AM-MobileNet1D

The Additive Margin MobileNet1D is a new light weight deep learning model for Speaker Recognition which is based on the MobileNetV2 architecture and the Additive Margin Softmax (AM-Softmax) loss function.)

100

delta

DELTA is a deep learning based natural language and speech processing platform.

Language:PythonApache-2.0159100

speaker-recognition-papers

Share some recent speaker recognition papers and their implementations.

Language:Python9000

VoiceprintRecognition-Tensorflow

使用Tensorflow实现声纹识别

Language:PythonApache-2.028600