WenzheLiu-Speech

followers

following

stars

Tencent

Beijing, China

https://wenzheliu-speech.github.io/

Wenzhe Liu (刘文哲)'s repositories

awesome-speech-enhancement

speech enhancement\speech seperation\sound source localization

GPL-2.0964 43 1

The-guidebook-of-speech-enhancement

ai-audio-datasets

AI Audio Datasets 🎵. A list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

MIT500

penguins-aicodec-demo

5 20

wenzheliu-speech

3 20

aac-datasets

Audio Captioning datasets for PyTorch.

Language:PythonMIT2 10

awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

200

JAECBF

Language:Python200

MP-SENet

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

Language:PythonMIT2 10

Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion

A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works (such as Music Synthesis, Automatic Music Transcription, Automatic MOS Prediction, SSL-based ASR...etc).

1 10

speech-synthesis-paper

List of speech synthesis papers.

MIT100

TFGAN-PLC

A Temporal-Spectral Generative Adversarial Network based End-to-end Packet Loss Concealment for Wideband Speech Transmission

Language:Python1 10

torchsubband

Pytorch implementation of subband decomposition

Language:HTMLMIT1 10

WenzheLiu-Speech.github.io

Language:HTML1 10

aero

Audio Super Resolution in the Spectral Domain

Language:PythonMIT010

cutword

一个简单快速的分词、命名实体识别工具

Apache-2.0000

EasyRec

A framework for large scale recommendation algorithms.

Language:PythonApache-2.0010

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0000

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD-3-Clause000

McNet

The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023

Language:Python010

minbpe

Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

MIT000

multi_quantization

Language:Python010

OpenVoice

Instant voice cloning by MyShell.

NOASSERTION000

SoundStorm

The reproduced code for Google's SoundStorm

000

SoundStream

This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf

Language:Python010

the-algorithm

Source code for Twitter's Recommendation Algorithm

AGPL-3.0000

tts-frontend-dataset

TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization

Apache-2.0000

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Apache-2.0000

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

MIT000

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

MIT000