saber5433's repositories

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:PythonStargazers:0Issues:0Issues:0

agc

Audiogen Codec

License:MITStargazers:0Issues:0Issues:0

aimoneyhunter

ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English version for more insights.

Stargazers:0Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

License:MITStargazers:0Issues:0Issues:0

audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

License:MITStargazers:0Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

License:MITStargazers:0Issues:0Issues:0

AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

License:NOASSERTIONStargazers:0Issues:0Issues:0

edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

License:GPL-3.0Stargazers:0Issues:0Issues:0

ER-NeRF

[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

License:MITStargazers:0Issues:0Issues:0

GeneFacePlusPlus

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Language:PythonStargazers:0Issues:0Issues:0

Genshin_Datasets

Genshin Datasets For SVC/SVS/TTS

Stargazers:0Issues:0Issues:0

hifi-gan-bwe

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

License:MITStargazers:0Issues:0Issues:0

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

License:Apache-2.0Stargazers:0Issues:0Issues:0

Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

License:MITStargazers:0Issues:0Issues:0

megatts2

Unoffical implementation of Megatts2

License:MITStargazers:0Issues:0Issues:0

minbpe

Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

License:MITStargazers:0Issues:0Issues:0

so-vits-svc-5.0

Core Engine of Singing Voice Conversion & Singing Voice Clone

License:MITStargazers:0Issues:0Issues:0

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

License:Apache-2.0Stargazers:0Issues:0Issues:0

StoryTTS

[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

Language:HTMLLicense:NOASSERTIONStargazers:0Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

TextrolSpeech

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)

License:MITStargazers:0Issues:0Issues:0

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

tts-scores

Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models

License:Apache-2.0Stargazers:0Issues:0Issues:0

UniCATS-CTX-txt2vec

[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS

Stargazers:0Issues:0Issues:0

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

License:Apache-2.0Stargazers:0Issues:0Issues:0

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

License:MITStargazers:0Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

License:MITStargazers:0Issues:0Issues:0

voice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer

License:NOASSERTIONStargazers:0Issues:0Issues:0