Wenhao Chai's repositories
StableVideo
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
Awesome-VQVAE
π A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
Awesome-DriveLM
π A collection of resources and papers on Large Language Models in autonomous driving
arxiv-daily
π Automatically Update Some Fields Papers Daily using Github Actions (Update Every 12th hours)
Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
3D-VisTA
Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"
all-seeing
This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
awesome-3D-gaussian-splatting
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
Awesome-Long-Context
A curated list of resources about long-context in large-language models and video understanding.
Awesome-MLLM-Hallucination
π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
Awesome-Multimodal-Large-Language-Models
Latest Papers and Datasets on Multimodal Large Language Models
awesome-NeRF
A curated list of awesome neural radiance fields papers
Awesome-Skeleton-based-Action-Recognition
A curated paper list of awesome skeleton-based action recognition.
DriveLM
DriveLM: Drive on Language
ED-Pose
[ICLR 2023] Official implementation of the paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "
ipl-uw.github.io
Website for IPL
LLaMA-Efficient-Tuning
Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan)
LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
minisora
The Mini Sora project aims to explore the implementation path and future development direction of Sora.
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
rese1f
Config files for my GitHub profile.