isrkhou's starred repositories

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1550Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:1593Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10698Issues:0Issues:0

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3098Issues:0Issues:0

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonStargazers:2240Issues:0Issues:0

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonLicense:MITStargazers:1882Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:4192Issues:0Issues:0

VLM_survey

Collection of AWESOME vision-language models for vision tasks

Stargazers:2021Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7987Issues:0Issues:0

SoM

Set-of-Mark Prompting for LMMs

Language:PythonLicense:MITStargazers:1039Issues:0Issues:0

SoM-LLaVA

[COLM-2024] List Items One by One: Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.

Language:PythonStargazers:98Issues:0Issues:0

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:PythonStargazers:229Issues:0Issues:0

UMT

🎬 UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection (CVPR 2022)

Stargazers:7Issues:0Issues:0

guidance-based-video-grounding

[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"

Stargazers:14Issues:0Issues:0

HiREST

Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)

Language:PythonLicense:MITStargazers:87Issues:0Issues:0

CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Language:PythonLicense:NOASSERTIONStargazers:95Issues:0Issues:0

QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

Language:PythonLicense:NOASSERTIONStargazers:184Issues:0Issues:0

DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Language:PythonStargazers:321Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:2076Issues:0Issues:0

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2006Issues:0Issues:0

MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Language:PythonLicense:Apache-2.0Stargazers:249Issues:0Issues:0

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:3981Issues:0Issues:0

supervision

We write your reusable computer vision tools. 💜

Language:PythonLicense:MITStargazers:17936Issues:0Issues:0

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonLicense:MITStargazers:2884Issues:0Issues:0

InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language:PythonLicense:Apache-2.0Stargazers:3169Issues:0Issues:0

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:6479Issues:0Issues:0

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Language:PythonLicense:Apache-2.0Stargazers:10513Issues:0Issues:0
Language:PythonLicense:MITStargazers:305Issues:0Issues:0

yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Language:PythonLicense:GPL-3.0Stargazers:8602Issues:0Issues:0

flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Language:PythonLicense:NOASSERTIONStargazers:13739Issues:0Issues:0