KimSeHyung (sayybro)

sayybro

Geek Repo

0

followers

0

following

Github PK Tool:Github PK Tool

KimSeHyung's starred repositories

Video-of-Thought

Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"

License:Apache-2.0Stargazers:25Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18366Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3450Issues:0Issues:0

TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Language:PythonLicense:Apache-2.0Stargazers:514Issues:0Issues:0

FreeMan_API

Official Repository for FreeMan dataset

Language:PythonLicense:MITStargazers:34Issues:0Issues:0
Language:PythonLicense:MITStargazers:21Issues:0Issues:0

SportsHHI

[CVPR 2024] SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Language:PythonStargazers:9Issues:0Issues:0

OED

Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".

Language:PythonLicense:Apache-2.0Stargazers:7Issues:0Issues:0

SpeaQ

Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection" (CVPR 2024).

Language:PythonStargazers:18Issues:0Issues:0

VT-TWINS

Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)

Language:PythonStargazers:14Issues:0Issues:0

MELTR

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

Language:PythonLicense:MITStargazers:32Issues:0Issues:0

OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Language:PythonStargazers:15Issues:0Issues:0

MCTF

Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

DDMI

Official Implementation (Pytorch) of "DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations", ICLR 2024

Language:PythonLicense:MITStargazers:18Issues:0Issues:0

vid-TLDR

Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".

Language:PythonLicense:MITStargazers:25Issues:0Issues:0

SPoTr

Official pytorch implementation of "Self-positioning Point-based Transformer for Point Cloud Understanding" (CVPR 2023).

Language:PythonStargazers:85Issues:0Issues:0

RALF

Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".

License:MITStargazers:21Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:104Issues:0Issues:0
License:Apache-2.0Stargazers:69Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:5190Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:117Issues:0Issues:0

3dlfm

Official codebase for 3D-LFM paper. Accepted at CVPR, 2024.

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:51Issues:0Issues:0

TCFormer

The codes for TCFormer in paper: Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Language:PythonLicense:Apache-2.0Stargazers:197Issues:0Issues:0

FusionFormer

FusionFormer: A Concise Unified Feature Fusion Transformer for 3D Pose Estimation

Stargazers:3Issues:0Issues:0

ContextAware-PoseFormer

The project is an official implementation of our paper "A Single 2D Pose With Context is Worth Hundreds for 3D Human Pose Estimation".

Language:PythonStargazers:64Issues:0Issues:0
Language:PythonStargazers:190Issues:0Issues:0
Language:PythonStargazers:36Issues:0Issues:0

MVGFormer

This is the official implementation of the work presented at CVPR 2024, titled Multiple View Geometry Transformers for 3D Human Pose Estimation (MVGFormer).

License:Apache-2.0Stargazers:23Issues:0Issues:0

HoT

[CVPR 2024 🔥] Official implementation of the paper "⏳ Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation"

Language:PythonLicense:MITStargazers:145Issues:0Issues:0

multi-hmr

Pytorch demo code and models for Multi-HMR

Language:PythonLicense:NOASSERTIONStargazers:157Issues:0Issues:0