dongsig

followers

following

stars

Tencent

Shanghai

dyang's repositories

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E, WIP

Language:PythonMIT100

AEC-Challenge

AEC Challenge

Language:PythonMIT000

AudioAge

Transferring audio features to build models for rare conditions with scarce data

Apache-2.0000

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

NOASSERTION000

AugLy

A data augmentations library for audio, image, text, and video.

Language:PythonNOASSERTION000

Auto-Age-Labeler

A web application that uses artificial intelligence to automatically label voice datasets with the age of the speaker.

MIT000

Bert-VITS2

vits2 backbone with bert

Language:PythonAGPL-3.0000

CITISEN

MIT000

create_wsj1_2345_db

Collection of scripts to create a dataset of noisy multi-channel reverberant mixtures based on wsj1 and CHiME3 datasets.

MIT000

E2E-KWS

End-to-End Keyword Spotting (E2E-KWS) using a character level LSTM

000

FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

MIT000

k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Apache-2.0000

kaldi_rt_decoder

using microphone

NOASSERTION000

KalmanNet_TSP

code for KalmanNet

000

latex-examples

small (la)tex files showing features, solutions, and attempts

000

musegan

An AI for Music Generation

MIT000

OpenChineseLLaMA

Chinese large language model base generated through incremental pre-training on Chinese datasets

GPL-3.0000

PaddleSpeech

Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Apache-2.0000

ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

MIT000

Percepnet-Keras

percepnet implemented using Keras, still need to be optimized and tuned.

BSD-3-Clause000

Pitch-Tracking

Pitch tracking in real-time with the Kalman filter

000

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding

MIT000

Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

BSD-3-Clause000

sound-source-localization-algorithm_DOA_estimation

关于语音信号声源定位DOA估计所用的一些传统算法

Apache-2.0000

Spoken-Keyword-Spotting

In this repository, we explore using a hybrid system consisting of a Convolutional Neural Network and a Support Vector Machine for Keyword Spotting task.

MIT000

ssspy

A Python toolkit for sound source separation.

Apache-2.0000

SummerTTS

SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目，可以本地运行不需要网络，而且没有额外的依赖，一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be easily used for Chinese TTS with just one key build out

000

torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

MIT000

torchiva

Blind source separation with independent vector analysis family of algorithm in torch

Language:PythonMIT000

Voice2Face

http://www.facegood.cc

MIT000