Beast code in Giters

KimSeHyung's starred repositories

Video-of-Thought

Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"

Apache-2.02500

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01836600

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonApache-2.0345000

TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Language:PythonApache-2.051400

FreeMan_API

Official Repository for FreeMan dataset

Language:PythonMIT3400

hoi-prediction-gaze-transformer

Language:PythonMIT2100

SportsHHI

[CVPR 2024] SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Language:Python900

OED

Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".

Language:PythonApache-2.0700

SpeaQ

Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection" (CVPR 2024).

Language:Python1800

VT-TWINS

Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)

Language:Python1400

MELTR

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

Language:PythonMIT3200

OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Language:Python1500

MCTF

Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".

Language:PythonMIT1900

DDMI

Official Implementation (Pytorch) of "DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations", ICLR 2024

Language:PythonMIT1800

vid-TLDR

Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".

Language:PythonMIT2500

SPoTr

Official pytorch implementation of "Self-positioning Point-based Transformer for Point Cloud Understanding" (CVPR 2023).

Language:Python8500

RALF

Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".

MIT2100

posescript

Language:PythonNOASSERTION10400

MouSi

Apache-2.06900

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0519000

PoseGPT

Language:PythonNOASSERTION11700

3dlfm

Official codebase for 3D-LFM paper. Accepted at CVPR, 2024.

Language:Jupyter NotebookBSD-3-Clause5100

TCFormer

The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Language:PythonApache-2.019700

FusionFormer

FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation

300

ContextAware-PoseFormer

The project is an official implementation of our paper "A Single 2D Pose With Context is Worth Hundreds for 3D Human Pose Estimation".

Language:Python6400

PoseGPT

Language:Python19000

KTPFormer

Language:Python3600

MVGFormer

This is the official implementation of the work presented at CVPR 2024, titled Multiple View Geometry Transformers for 3D Human Pose Estimation (MVGFormer).

Apache-2.02300

HoT

[CVPR 2024 🔥] Official implementation of the paper "⏳ Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation"

Language:PythonMIT14500

multi-hmr

Pytorch demo code and models for Multi-HMR

Language:PythonNOASSERTION15700