LiangXu123

LiangXu123

Geek Repo

Company:sensetime

Location:shenzhen

Github PK Tool:Github PK Tool

LiangXu123's starred repositories

Stargazers:2Issues:0Issues:0

BridgeQA

[AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Language:PythonLicense:NOASSERTIONStargazers:6Issues:0Issues:0

OV-SAM3D

Open-Vocabulary SAM3D: Understand Any 3D Scene

License:MITStargazers:15Issues:0Issues:0

PLA

(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding & (CVPR2024) RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Language:PythonLicense:Apache-2.0Stargazers:224Issues:0Issues:0

embodied-generalist

[ICML 2024] Official code repository for 3D embodied generalist agent LEO

Language:PythonLicense:MITStargazers:260Issues:0Issues:0

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

License:CC0-1.0Stargazers:15432Issues:0Issues:0

LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Language:PythonLicense:MITStargazers:186Issues:0Issues:0

Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods

Language:PythonLicense:MITStargazers:75Issues:0Issues:0

Awesome-LLM-3D

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

License:MITStargazers:759Issues:0Issues:0

MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Language:PythonLicense:BSD-3-ClauseStargazers:432Issues:0Issues:0

activitynet-qa

An VideoQA dataset based on the videos from ActivityNet

Language:PythonLicense:Apache-2.0Stargazers:56Issues:0Issues:0

aifasthub

AI快站是专为AI开发者打造的HuggingFace资源镜像加速服务网站。

Language:PythonStargazers:4Issues:0Issues:0

Awesome_Long_Form_Video_Understanding

Awesome papers & datasets specifically focused on long-term videos.

Stargazers:96Issues:0Issues:0

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:672Issues:0Issues:0

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonLicense:NOASSERTIONStargazers:151Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:127377Issues:0Issues:0

MiniGPT4-video

Official code for MiniGPT4-video

Language:PythonLicense:BSD-3-ClauseStargazers:416Issues:0Issues:0

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookStargazers:10077Issues:0Issues:0

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

Stargazers:811Issues:0Issues:0

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonLicense:BSD-3-ClauseStargazers:25059Issues:0Issues:0

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonLicense:MITStargazers:18471Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9573Issues:0Issues:0

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Language:Jupyter NotebookLicense:MITStargazers:23003Issues:0Issues:0
Language:PythonStargazers:37Issues:0Issues:0
Language:PythonLicense:MITStargazers:180Issues:0Issues:0

all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

Language:PythonStargazers:274Issues:0Issues:0

awesome-Vision-and-Language-Pre-training

Recent Advances in Vision and Language Pre-training (VLP)

License:Apache-2.0Stargazers:281Issues:0Issues:0
Language:HTMLStargazers:1Issues:0Issues:0

jimmy-narang.github.io

A beautiful, simple, clean, and responsive Jekyll theme for academics

Language:HTMLLicense:MITStargazers:1Issues:0Issues:0