Pass-O-Guava's starred repositories
DiffSynth-Studio
Enjoy the magic of Diffusion models!
dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
florence2-finetuning
Quick exploration into fine tuning florence 2
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
stable-audio-tools
Generative models for conditional audio generation
DiffusionDPO
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
RectifiedFlow
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
flash-diffusion
Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
vision-lstm
xLSTM as Generic Vision Backbone
build-nanogpt
Video+code lecture on building nanoGPT from scratch
3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization