Xiaowei Chi's starred repositories

3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Language:PythonStargazers:324Issues:0Issues:0

phenaki-pytorch

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

Language:PythonLicense:MITStargazers:747Issues:0Issues:0

Awesome-Video-Robotic-Papers

This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.

Stargazers:114Issues:0Issues:0

videocrafter-training-pytorch

Training code for the videocrafter.

Language:PythonLicense:NOASSERTIONStargazers:4Issues:0Issues:0

MMTrail-Pytorch

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Stargazers:2Issues:0Issues:0

awesome-diffusion-model-in-rl

A curated list of Diffusion Model in RL resources (continually updated)

License:Apache-2.0Stargazers:767Issues:0Issues:0

1xgpt

world modeling challenge for humanoid robots

Language:PythonLicense:Apache-2.0Stargazers:323Issues:0Issues:0

Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Language:PythonLicense:MITStargazers:1542Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21826Issues:0Issues:0

Awesome-Embodied-AI

A curated list of awesome papers on Embodied AI and related research/industry-driven resources.

License:MITStargazers:263Issues:0Issues:0

AlignProp

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion

Language:PythonLicense:MITStargazers:233Issues:0Issues:0

MMTrail

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Stargazers:22Issues:0Issues:0

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:415Issues:0Issues:0

MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Language:PythonLicense:MITStargazers:20Issues:0Issues:0

calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Language:PythonLicense:MITStargazers:373Issues:0Issues:0

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonLicense:MITStargazers:3690Issues:0Issues:0

Pandora

Pandora: Towards General World Model with Natural Language Actions and Video States

Language:PythonStargazers:469Issues:0Issues:0
Language:PythonLicense:AGPL-3.0Stargazers:513Issues:0Issues:0
Stargazers:1157Issues:0Issues:0

lvm_datapipe

data pipeline code of large video generation model

Language:PythonStargazers:7Issues:0Issues:0

Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Language:PythonLicense:NOASSERTIONStargazers:119Issues:0Issues:0

SVD_Xtend

Stable Video Diffusion Training Code and Extensions.

Language:PythonStargazers:578Issues:0Issues:0

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonLicense:Apache-2.0Stargazers:1658Issues:0Issues:0

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:509Issues:0Issues:0

MMDialog

The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

Language:PythonStargazers:189Issues:0Issues:0

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonLicense:Apache-2.0Stargazers:946Issues:0Issues:0

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonLicense:MITStargazers:551Issues:0Issues:0

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonLicense:Apache-2.0Stargazers:10386Issues:0Issues:0
Language:PythonStargazers:32Issues:0Issues:0

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language:HTMLStargazers:318Issues:0Issues:0