isrkhou's starred repositories
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
VLM_survey
Collection of AWESOME vision-language models for vision tasks
Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
guidance-based-video-grounding
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
supervision
We write your reusable computer vision tools. 💜
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation