Xueqing Deng's starred repositories

RADIO

Official repository for "AM-RADIO: Reduce All Domains Into One"

Language:PythonLicense:NOASSERTIONStargazers:547Issues:0Issues:0

PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.

Language:PythonLicense:Apache-2.0Stargazers:162Issues:0Issues:0

flux

Official inference repo for FLUX.1 models

Language:PythonLicense:Apache-2.0Stargazers:9030Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18793Issues:0Issues:0

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:1119Issues:0Issues:0

SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Language:PythonLicense:AGPL-3.0Stargazers:1250Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:21Issues:0Issues:0

Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Language:PythonStargazers:227Issues:0Issues:0

richhf-18k

RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

Stargazers:88Issues:0Issues:0

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:4568Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:24707Issues:0Issues:0

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonLicense:MITStargazers:38092Issues:0Issues:0

Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonLicense:Apache-2.0Stargazers:509Issues:0Issues:0

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:1142Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:1977Issues:0Issues:0

DenseDiffusion

Official Pytorch Implementation of DenseDiffusion (ICCV 2023)

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:466Issues:0Issues:0

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:7103Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonLicense:MITStargazers:5089Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:138Issues:0Issues:0
License:MITStargazers:28Issues:0Issues:0

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonLicense:Apache-2.0Stargazers:1714Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3931Issues:0Issues:0

ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Language:PythonLicense:Apache-2.0Stargazers:161Issues:0Issues:0

champ

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Language:PythonLicense:MITStargazers:3547Issues:0Issues:0

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonLicense:Apache-2.0Stargazers:2395Issues:0Issues:0

sam-hq

Segment Anything in High Quality [NeurIPS 2023]

Language:PythonLicense:Apache-2.0Stargazers:3608Issues:0Issues:0

InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

Language:PythonLicense:Apache-2.0Stargazers:453Issues:0Issues:0

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:PythonStargazers:723Issues:0Issues:0

ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

Language:PythonLicense:Apache-2.0Stargazers:1066Issues:0Issues:0

motion_guidance

Code for ICLR 2024 paper "Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators"

Language:PythonStargazers:84Issues:0Issues:0