felixfuu's starred repositories

infinite-zoom-automatic1111-webui

infinite zoom effect extension for AUTOMATIC1111's webui - stable diffusion

Language:PythonLicense:MITStargazers:651Issues:0Issues:0

MS-Diffusion

Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Language:PythonLicense:MITStargazers:38Issues:0Issues:0

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4375Issues:0Issues:0

MimicBrush

Official implementations for paper: Zero-shot Image Editing with Reference Imitation

Language:PythonLicense:Apache-2.0Stargazers:490Issues:0Issues:0

style-aligned

Official code for "Style Aligned Image Generation via Shared Attention"

Language:PythonLicense:Apache-2.0Stargazers:1120Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:9Issues:0Issues:0

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1188Issues:0Issues:0

HQ-Edit

HQ-Edit: A High-Quality and High-Coverage Dataset for General Image Editing

Language:PythonLicense:NOASSERTIONStargazers:57Issues:0Issues:0

ReMoDiffuse

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Language:PythonLicense:NOASSERTIONStargazers:305Issues:0Issues:0

Inf-DiT

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Language:PythonLicense:Apache-2.0Stargazers:261Issues:0Issues:0

Awesome-Open-Vocabulary-Detection-and-Segmentation

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

Stargazers:53Issues:0Issues:0

Prompt-Diffusion

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Language:PythonLicense:Apache-2.0Stargazers:359Issues:0Issues:0
Language:PythonStargazers:720Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7468Issues:0Issues:0

MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Language:Jupyter NotebookStargazers:123Issues:0Issues:0

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonLicense:NOASSERTIONStargazers:2479Issues:0Issues:0

SPTSv2

The official implementation of SPTS v2: Single-Point Text Spotting

Language:PythonLicense:Apache-2.0Stargazers:119Issues:0Issues:0

IC-Light

More relighting!

Language:PythonLicense:Apache-2.0Stargazers:3846Issues:0Issues:0

cv-arxiv-daily

🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonLicense:Apache-2.0Stargazers:797Issues:0Issues:0

GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Language:PythonStargazers:110Issues:0Issues:0

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonLicense:Apache-2.0Stargazers:1075Issues:0Issues:0

Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonLicense:Apache-2.0Stargazers:461Issues:0Issues:0

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonLicense:NOASSERTIONStargazers:409Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:3654Issues:0Issues:0

llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Language:Jupyter NotebookStargazers:59Issues:0Issues:0

scaling_on_scales

When do we not need larger vision models?

Language:PythonLicense:MITStargazers:247Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:127Issues:0Issues:0

DisenDiff

[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization

Language:PythonLicense:MITStargazers:59Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3719Issues:0Issues:0

InstructCV

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Language:PythonLicense:NOASSERTIONStargazers:515Issues:0Issues:0