felixfuu's starred repositories

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7619Issues:75Issues:252

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4424Issues:61Issues:339

IC-Light

More relighting!

Language:PythonLicense:Apache-2.0Stargazers:4050Issues:42Issues:60

VAR

[GPT beats diffusionšŸ”„] [scaling laws in visual generationšŸ“ˆ] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3745Issues:110Issues:68

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. ꎄčæ‘GPT-4Vč”ØēŽ°ēš„åÆ商ē”Ø开ęŗå¤šęØ”ę€åƹčƝęؔ型

Language:PythonLicense:MITStargazers:3738Issues:38Issues:267

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonLicense:NOASSERTIONStargazers:2553Issues:33Issues:100

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1325Issues:22Issues:25

style-aligned

Official code for "Style Aligned Image Generation via Shared Attention"

Language:PythonLicense:Apache-2.0Stargazers:1128Issues:23Issues:24

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonLicense:Apache-2.0Stargazers:1088Issues:27Issues:84

cv-arxiv-daily

šŸŽ“Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonLicense:Apache-2.0Stargazers:801Issues:37Issues:2

infinite-zoom-automatic1111-webui

infinite zoom effect extension for AUTOMATIC1111's webui - stable diffusion

Language:PythonLicense:MITStargazers:651Issues:9Issues:62

MimicBrush

Official implementations for paper: Zero-shot Image Editing with Reference Imitation

Language:PythonLicense:Apache-2.0Stargazers:585Issues:11Issues:11

InstructCV

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Language:PythonLicense:NOASSERTIONStargazers:515Issues:32Issues:8

Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonLicense:Apache-2.0Stargazers:464Issues:36Issues:14

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonLicense:NOASSERTIONStargazers:412Issues:18Issues:9

Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Language:PythonLicense:Apache-2.0Stargazers:362Issues:7Issues:13

ReMoDiffuse

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Language:PythonLicense:NOASSERTIONStargazers:305Issues:16Issues:18

Inf-DiT

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Language:PythonLicense:Apache-2.0Stargazers:268Issues:22Issues:14

scaling_on_scales

When do we not need larger vision models?

Language:PythonLicense:MITStargazers:247Issues:4Issues:13
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:127Issues:4Issues:15

MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Language:Jupyter NotebookStargazers:126Issues:2Issues:6

SPTSv2

The official implementation of SPTS v2: Single-Point Text Spotting

Language:PythonLicense:Apache-2.0Stargazers:119Issues:5Issues:19

GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

MS-Diffusion

Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Language:PythonLicense:MITStargazers:69Issues:3Issues:4

llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Language:Jupyter NotebookStargazers:59Issues:2Issues:4

DisenDiff

[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization

Language:PythonLicense:MITStargazers:59Issues:3Issues:6

Awesome-Open-Vocabulary-Detection-and-Segmentation

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

HQ-Edit

HQ-Edit: A High-Quality and High-Coverage Dataset for General Image Editing

Language:PythonLicense:NOASSERTIONStargazers:57Issues:6Issues:5
Language:PythonLicense:Apache-2.0Stargazers:9Issues:1Issues:3