OpenGVLab's repositories
LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Ask-Anything
[CVPR2024][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
InternVideo
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
all-seeing
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
UniFormerV2
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
HumanBench
This repo is official implementation of HumanBench (CVPR2023)
ControlLLM
ControlLLM: Augment Language Models with Tools by Searching on Graphs
MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Vision-RWKV
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Awesome-DragGAN
Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN
ego4d-eccv2022-solutions
Champion Solutions for Ego4D Chanllenge of ECCV 2022
Multitask-Model-Selector
Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
perception_test_iccv2023
Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
LLMPrune-BESA
BESA is a differentiable weight pruning technique for large language models.