i-MaTh's starred repositories
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
ControlNet
Let us control diffusion models!
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
StreamDiffusion
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
metavoice-src
Foundational model for human-like, expressive TTS
stable-audio-tools
Generative models for conditional audio generation
voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
MiniGPT4Qwen
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
stable-audio-metrics
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
UniCATS-CTX-vec2wav
[AAAI 2024] Code for CTX-vec2wav in UniCATS
whisper-punctuator
Zero-shot multimodal punctuation insertion and truecasing using Whisper