Beast code in Giters

felixfuu's starred repositories

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonMIT623700

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter Notebook158300

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonApache-2.018000

unified-io-2

Language:PythonApache-2.053300

LLaVA-Grounding

Language:PythonApache-2.028800

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.01400300

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION260000

MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Language:PythonApache-2.024400

GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Language:PythonMIT95700

FIND

Language:Python9200

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:Python272900

AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization

Language:PythonMIT381900

awesome-diffusion-categorized

collection of diffusion model papers categorized by their subareas

91800

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonAGPL-3.0249700

HumanBench

This repo is official implementation of HumanBench (CVPR2023)

Language:PythonMIT21500

BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language:PythonBSD-3-Clause2200

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

78100

consistencydecoder

Consistency Distilled Diff VAE

Language:PythonMIT209500

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonApache-2.065100

Entity

EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation

Language:Jupyter NotebookNOASSERTION67900

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT350600

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python64200