xdeng7

Xueqing Deng's starred repositories

RADIO

Official repository for "AM-RADIO: Reduce All Domains Into One"

Language:PythonNOASSERTION54700

PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.

Language:PythonApache-2.016200

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.0903000

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01879300

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.0111900

SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Language:PythonAGPL-3.0125000

CAPTURE

Language:PythonApache-2.02100

Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Language:Python22700

richhf-18k

RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

8800

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookBSD-3-Clause456800

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02470700

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonMIT3809200

Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonApache-2.050900

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT114200

LLaVA-NeXT

Language:PythonApache-2.0197700

DenseDiffusion

Official Pytorch Implementation of DenseDiffusion (ICCV 2023)

Language:Jupyter NotebookApache-2.046600

Omost

Your image is almost there!

Language:PythonApache-2.0710300

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonMIT508900

coconut_cvpr2024

Language:Jupyter NotebookApache-2.013800

NightLab

MIT2800

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonApache-2.0171400

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT393100