Wenzhe Liu (刘文哲)'s repositories
awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
ai-audio-datasets
AI Audio Datasets 🎵. A list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
aac-datasets
Audio Captioning datasets for PyTorch.
awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works (such as Music Synthesis, Automatic Music Transcription, Automatic MOS Prediction, SSL-based ASR...etc).
speech-synthesis-paper
List of speech synthesis papers.
torchsubband
Pytorch implementation of subband decomposition
cutword
一个简单快速的分词、命名实体识别工具
EasyRec
A framework for large scale recommendation algorithms.
gemma_pytorch
The official PyTorch implementation of Google's Gemma models
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
minbpe
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
OpenVoice
Instant voice cloning by MyShell.
SoundStorm
The reproduced code for Google's SoundStorm
SoundStream
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
the-algorithm
Source code for Twitter's Recommendation Algorithm
tts-frontend-dataset
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization
vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)