Zhenhailong Wang's starred repositories
Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Awesome-Video-Datasets
Video datasets
Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
maze-dataset
maze datasets for investigating OOD behavior of ML systems
Tracking-Anything-with-DEVA
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
atp-video-language
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (ATP).
Solo-Performance-Prompting
Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"
tree-of-thought-llm
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
dalle2-laion
Pretrained Dalle2 from laion
imagen-pytorch
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
LookForTheChange
Code for Look for the Change paper published at CVPR 2022
procthor-10k
The ProcTHOR-10K Houses Dataset