felixfuu's starred repositories
Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
awesome-diffusion-categorized
collection of diffusion model papers categorized by their subareas
PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
HumanBench
This repo is official implementation of HumanBench (CVPR2023)
LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
consistencydecoder
Consistency Distilled Diff VAE
LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
T2I-Adapter
T2I-Adapter
Mini-DALLE3
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models