Kai0226

followers

following

stars

Kai Niu's repositories

EMO

000

Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Language:PythonMIT000

generative-models

Generative Models by Stability AI

Language:PythonMIT000

HarmonyView

Official pytorch implementation of "HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D"

Language:PythonMIT000

i2vgen-xl

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:Python000

I2VGen-XL-colab

Language:Jupyter Notebook000

ICONIP2019

Code and dataset for ICONIP2019

Language:Jupyter Notebook000

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Language:PythonApache-2.0000

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Language:PythonNOASSERTION000

LangSplat

Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting"

Language:PythonNOASSERTION000

LLM

000

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Language:PythonApache-2.0000

PHD

Multi-modality generative foundation models, Parameter efficient fine-tuning, Large language models, Contrastive Language–Image Pre-training, Text-video pre-training

Language:Jupyter Notebook000

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.0000

UoIDLHealthcare

Deep Learning for Healthcare Specialization

Language:Jupyter Notebook000

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD-3-Clause000

lazypredict

Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning

MIT000

LLaVA-Med

Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.

NOASSERTION000

MagicTime

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Apache-2.0000

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

MIT000

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

BSD-3-Clause000

normal-depth-diffusion

Apache-2.0000

richdreamer

Apache-2.0000

sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies

NOASSERTION000

TESTA

[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

MIT000

Text-To-Video-Finetuning

Finetune ModelScope's Text To Video model using Diffusers 🧨

MIT000

video-generation-survey

A reading list of video generation

000

videocomposer

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

MIT000

VideoDirectorGPT

official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

000

WHAM

MIT000