Beast code in Giters

isrkhou's starred repositories

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.0155000

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0159300

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1069800

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0309800

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:Python224000

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonMIT188200

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonMIT419200

VLM_survey

Collection of AWESOME vision-language models for vision tasks

202100

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonApache-2.0798700

SoM

Set-of-Mark Prompting for LMMs

Language:PythonMIT103900

SoM-LLaVA

[COLM-2024] List Items One by One: Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.

Language:Python9800

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:Python22900

UMT

🎬 UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection (CVPR 2022)

700

guidance-based-video-grounding

[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"

1400

HiREST

Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)

Language:PythonMIT8700

CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Language:PythonNOASSERTION9500

QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

Language:PythonNOASSERTION18400

DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Language:Python32100

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION200600

MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Language:PythonApache-2.024900

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.0398100

supervision

We write your reusable computer vision tools. 💜

Language:PythonMIT1793600

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonMIT288400

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language:PythonApache-2.0316900

isrkhou