Beast code in Giters

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.01429600

segment-anything-with-clip

Segment Anything combined with CLIP

Language:PythonApache-2.032100

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0310800

InstMatt

Official repository for Instance Human Matting via Mutual Guidance and Multi-Instance Refinement

Language:Python9900

MPEblink

[CVPR 2023] Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

Language:PythonApache-2.04800

ViTGaze

Language:PythonMIT2900

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Language:PythonApache-2.0209900

GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Language:PythonMIT97800

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonMIT1900700

NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language:PythonMIT15000

learning_research

本人的科研经验

499600

VM-UNetV2

Language:PythonApache-2.06700

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

725300

ptp

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

Language:PythonApache-2.014700

Awesome-state-space-models

Collection of papers on state-space models

48700

mamba

The Fast Cross-Platform Package Manager

Language:C++BSD-3-Clause658700

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Language:Jupyter NotebookMIT2381000