xiaoyazhu's starred repositories
Awesome-Open-Vocabulary-Object-Detection
A curated list of papers, datasets and resources pertaining to open vocabulary object detection.
awesome-open-world-object-detection
This repository lists some awesome public Open World object detection series projects.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
awesome-described-object-detection
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
openimages2coco
Convert Open Images annotations into MS Coco format to make it a drop in replacement
RegionCLIP
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
open-images-dataset
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
X-AnyLabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
UniDetector
Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
BrnoCompSpeed
Code for BrnoCompSpeed dataset
CRAFT-Reimplementation
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch
mmtracking
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
deep_sort_pytorch
MOT using deepsort and yolov3 with pytorch
yolo_tracking
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models