Kai Niu's repositories
Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
generative-models
Generative Models by Stability AI
HarmonyView
Official pytorch implementation of "HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D"
i2vgen-xl
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
ICONIP2019
Code and dataset for ICONIP2019
InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
LangSplat
Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting"
MetaTransformer
Meta-Transformer for Unified Multimodal Learning
PHD
Multi-modality generative foundation models, Parameter efficient fine-tuning, Large language models, Contrastive Language–Image Pre-training, Text-video pre-training
stable-diffusion-webui
Stable Diffusion web UI
UoIDLHealthcare
Deep Learning for Healthcare Specialization
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
lazypredict
Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning
LLaVA-Med
Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.
MagicTime
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
MiDaS
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
sd-webui-text2video
Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Text-To-Video-Finetuning
Finetune ModelScope's Text To Video model using Diffusers 🧨
video-generation-survey
A reading list of video generation
videocomposer
Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
VideoDirectorGPT
official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning