LiangXu123's starred repositories
embodied-generalist
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
Vote2Cap-DETR
[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
activitynet-qa
An VideoQA dataset based on the videos from ActivityNet
Awesome_Long_Form_Video_Understanding
Awesome papers & datasets specifically focused on long-term videos.
Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
MiniGPT4-video
Official code for MiniGPT4-video
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
jimmy-narang.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics