Kai Niu's repositories

Stargazers:0Issues:0Issues:0

Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

generative-models

Generative Models by Stability AI

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

HarmonyView

Official pytorch implementation of "HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

i2vgen-xl

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:PythonStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0

ICONIP2019

Code and dataset for ICONIP2019

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

LangSplat

Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting"

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

PHD

Multi-modality generative foundation models, Parameter efficient fine-tuning, Large language models, Contrastive Language–Image Pre-training, Text-video pre-training

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

UoIDLHealthcare

Deep Learning for Healthcare Specialization

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

lazypredict

Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning

License:MITStargazers:0Issues:0Issues:0

LLaVA-Med

Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.

License:NOASSERTIONStargazers:0Issues:0Issues:0

MagicTime

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

License:Apache-2.0Stargazers:0Issues:0Issues:0

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

License:MITStargazers:0Issues:0Issues:0

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

License:BSD-3-ClauseStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies

License:NOASSERTIONStargazers:0Issues:0Issues:0

TESTA

[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

License:MITStargazers:0Issues:0Issues:0

Text-To-Video-Finetuning

Finetune ModelScope's Text To Video model using Diffusers 🧨

License:MITStargazers:0Issues:0Issues:0

video-generation-survey

A reading list of video generation

Stargazers:0Issues:0Issues:0

videocomposer

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

License:MITStargazers:0Issues:0Issues:0

VideoDirectorGPT

official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0