saber5433's repositories
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
agc
Audiogen Codec
aimoneyhunter
ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English version for more insights.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
Bert-VITS2
vits2 backbone with multilingual-bert
DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
ER-NeRF
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
GeneFacePlusPlus
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
Genshin_Datasets
Genshin Datasets For SVC/SVS/TTS
hifi-gan-bwe
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.
libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
megatts2
Unoffical implementation of Megatts2
minbpe
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
so-vits-svc-5.0
Core Engine of Singing Voice Conversion & Singing Voice Clone
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
UniCATS-CTX-txt2vec
[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS
vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
voice-changer
リアルタイムボイスチェンジャー Realtime Voice Changer