Yinan He's starred repositories
LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
sd-webui-animatediff
AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
all-seeing
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
self-correction-llm-papers
This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
VideoBooth
[CVPR2024] VideoBooth: Diffusion-based Video Generation with Image Prompts
OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ForgeryNet
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
GPT-4V-API
Self-hosted GPT-4V api
video-fingerprinting
VisioForge Video Fingerprinting SDK Demos