Beast code in Giters

felixfuu's starred repositories

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonApache-2.07619 75 252

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Language:Jupyter NotebookApache-2.04424 61 339

IC-Light

More relighting!

Language:PythonApache-2.04050 42 60

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT3745 110 68

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonMIT3738 38 267

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonNOASSERTION2553 33 100

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION1325 22 25

style-aligned

Official code for "Style Aligned Image Generation via Shared Attention"

Language:PythonApache-2.01128 23 24

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonApache-2.01088 27 84

cv-arxiv-daily

🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonApache-2.0801 37 2

MAP-NEO

Language:Python731 11 27

infinite-zoom-automatic1111-webui

infinite zoom effect extension for AUTOMATIC1111's webui - stable diffusion

Language:PythonMIT651 9 62

MimicBrush

Official implementations for paper: Zero-shot Image Editing with Reference Imitation

Language:PythonApache-2.0585 11 11

InstructCV

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Language:PythonNOASSERTION515 32 8

Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonApache-2.0464 36 14

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonNOASSERTION412 18 9

Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Language:PythonApache-2.0362 7 13

ReMoDiffuse

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Language:PythonNOASSERTION305 16 18

Inf-DiT

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Language:PythonApache-2.0268 22 14

scaling_on_scales

When do we not need larger vision models?

Language:PythonMIT247 4 13

coconut_cvpr2024

Language:Jupyter NotebookApache-2.0127 4 15

MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Language:Jupyter Notebook126 2 6

SPTSv2

The official implementation of SPTS v2: Single-Point Text Spotting

Language:PythonApache-2.0119 5 19

GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Language:Python112 5 13

MS-Diffusion

Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Language:PythonMIT69 3 4

llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Language:Jupyter Notebook59 2 4

DisenDiff

[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization

Language:PythonMIT59 3 6

Awesome-Open-Vocabulary-Detection-and-Segmentation

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

58 10

HQ-Edit

HQ-Edit: A High-Quality and High-Coverage Dataset for General Image Editing

Language:PythonNOASSERTION57 6 5

lvlm-interpret

Language:PythonApache-2.09 1 3