Qoboty's repositories
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Bert-VITS2-ext
基于Bert-VITS2做的表情、动画测试
best-rq-pytorch
Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
clash
A rule-based tunnel in Go.
CosyVoice
LLM based TTS model, providing inference/training/deployment full-stack ability.
fish-speech
Brand new TTS solution
HierSpeechpp
The official implementation of HierSpeech++
Inpaint-Anything
Inpaint anything using Segment Anything and inpainting models.
llark
Code for the paper "LLark: A Multimodal Foundation Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.
ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
magic-animate
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
MetaMath
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
OpenVoice
Instant voice cloning by MyShell
parler-tts
Inference and training library for high-quality TTS models.
SoundStorm
The reproduced code for Google's SoundStorm
stable-audio-tools
Generative models for conditional audio generation
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
TTS-xtts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
UMOE-Scaling-Unified-Multimodal-LLMs
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
UniAudio
The Open Source Code of UniAudio
UniCATS-CTX-txt2vec
CTX-txt2vec, the acoustic model in UniCATS
UniCATS-CTX-vec2wav
Code for CTX-vec2wav in UniCATS
Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
vocode-python
🤖 Build voice-based LLM agents. Modular + open source.
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild