yinanhe

Yinan He's starred repositories

grok-1

Grok open release

Language:PythonApache-2.049209 561 202

InternLM

Official release of InternLM2.5 7B base and chat models. 1M context support

Language:PythonApache-2.05884 54 309

LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Language:PythonGPL-3.05625 78 142

DragGAN

Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" （DragGAN 全功能实现，在线Demo，本地部署试用，代码、模型已全部开源，支持Windows, macOS, Linux）

Language:Python4989 66 112

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonMIT4421 43 376

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language:PythonApache-2.03174 43 49

sd-webui-animatediff

AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI

Language:PythonNOASSERTION2971 22 364

MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Language:PythonNOASSERTION1129 13 24

Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.01088 13 112

Show-1

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Language:PythonNOASSERTION1079 39 19

InternLM-techreport

904 23 10

SEINE

[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Language:PythonApache-2.0866 24 28

LaVie

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

Language:PythonApache-2.0791 27 23

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonApache-2.0716 12 71

all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Language:Python428 23 18

Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Language:Python419 6 21

VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Language:PythonApache-2.0415 11 45

Vlogger

[CVPR2024] Make Your Dream A Vlog

Language:PythonApache-2.0392 10 15

self-correction-llm-papers

This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.

Apache-2.0365 12 1

FreeNoise

[ICLR 2024] Code for FreeNoise based on VideoCrafter

Language:PythonApache-2.0353 6 13

VideoBooth

[CVPR2024] VideoBooth: Diffusion-based Video Generation with Image Prompts

Language:Python230 22 9

OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

202 10 3

VideoLLM

VideoLLM: Modeling Video Sequence with Large Language Models

Apache-2.0152 17 7

ForgeryNet

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Language:Python100 4 48

T2VScore

T2VScore: Towards A Better Metric for Text-to-Video Generation

73 8 2

FETV

[NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin Liu, Lei Li, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu Sun, Lu Hou

Language:Python45 1 4

EgoExoLearn

[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset

Language:PythonMIT39 2 4

GPT-4V-API

Self-hosted GPT-4V api

Language:JavaScriptMIT30 1 1

MovieMind

12 30

video-fingerprinting

VisioForge Video Fingerprinting SDK Demos

Language:C#MIT300