LiangXu123

LiangXu123's starred repositories

BridgeQA

[AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Language:PythonNOASSERTION600

OV-SAM3D

Open-Vocabulary SAM3D: Understand Any 3D Scene

MIT1500

PLA

(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding & (CVPR2024) RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Language:PythonApache-2.022400

embodied-generalist

[ICML 2024] Official code repository for 3D embodied generalist agent LEO

Language:PythonMIT26000

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

CC0-1.01543200

LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Language:PythonMIT18600

Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods

Language:PythonMIT7500

Awesome-LLM-3D

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

MIT75900

MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Language:PythonBSD-3-Clause43200

activitynet-qa

An VideoQA dataset based on the videos from ActivityNet

Language:PythonApache-2.05600

aifasthub

AI快站是专为AI开发者打造的HuggingFace资源镜像加速服务网站。

Language:Python400

Awesome_Long_Form_Video_Understanding

Awesome papers & datasets specifically focused on long-term videos.

9600

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.067200

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonNOASSERTION15100

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.012737700

MiniGPT4-video

Official code for MiniGPT4-video

Language:PythonBSD-3-Clause41600

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter Notebook1007700

LiangXu123

LiangXu123's starred repositories

uni3drr

BridgeQA

OV-SAM3D

PLA

embodied-generalist

Awesome-LLM

LL3DA

Vote2Cap-DETR

Awesome-LLM-3D

MovieChat

activitynet-qa

aifasthub

Awesome_Long_Form_Video_Understanding

Chat-UniVi

VTimeLLM

transformers

MiniGPT4-video

llama-recipes

Awesome-LLMs-for-Video-Understanding

MiniGPT-4

vit-pytorch

CVPR-2024-Papers

vision_transformer

CLIP

Boosting-WTAL

Efficient-Prompt

all-in-one

awesome-Vision-and-Language-Pre-training

wangxionghome.github.io

jimmy-narang.github.io