felixfuu's starred repositories

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonLicense:MITStargazers:6237Issues:0Issues:0

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter NotebookStargazers:1583Issues:0Issues:0

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonLicense:Apache-2.0Stargazers:180Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:533Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:288Issues:0Issues:0

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14003Issues:0Issues:0

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2600Issues:0Issues:0

MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Language:PythonLicense:Apache-2.0Stargazers:244Issues:0Issues:0

GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Language:PythonLicense:MITStargazers:957Issues:0Issues:0
Language:PythonStargazers:92Issues:0Issues:0

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:PythonStargazers:2729Issues:0Issues:0

AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization

Language:PythonLicense:MITStargazers:3819Issues:0Issues:0

awesome-diffusion-categorized

collection of diffusion model papers categorized by their subareas

Stargazers:918Issues:0Issues:0

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonLicense:AGPL-3.0Stargazers:2497Issues:0Issues:0

HumanBench

This repo is official implementation of HumanBench (CVPR2023)

Language:PythonLicense:MITStargazers:215Issues:0Issues:0

BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language:PythonLicense:BSD-3-ClauseStargazers:22Issues:0Issues:0

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

Stargazers:781Issues:0Issues:0

consistencydecoder

Consistency Distilled Diff VAE

Language:PythonLicense:MITStargazers:2095Issues:0Issues:0

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonLicense:Apache-2.0Stargazers:651Issues:0Issues:0

Entity

EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:679Issues:0Issues:0

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonLicense:MITStargazers:3506Issues:0Issues:0

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:PythonStargazers:642Issues:0Issues:0

ALIKE

ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction

Language:PythonLicense:BSD-3-ClauseStargazers:292Issues:0Issues:0

T2I-Adapter

T2I-Adapter

Language:PythonStargazers:3281Issues:0Issues:0

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

License:MITStargazers:178Issues:0Issues:0

Mini-DALLE3

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Language:PythonStargazers:294Issues:0Issues:0

MiniGPT-5

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

Language:PythonLicense:Apache-2.0Stargazers:831Issues:0Issues:0
License:Apache-2.0Stargazers:4736Issues:0Issues:0

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonLicense:MITStargazers:37323Issues:0Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:334Issues:0Issues:0