Multimedia Computing Group, Nanjing University

Multimedia Computing Group, Nanjing University's repositories

VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Language:PythonNOASSERTION1306 16 119

MixFormer

[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Language:PythonMIT445 7 105

SparseBEV

[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

Language:PythonMIT324 9 80

CamLiFlow

[CVPR 2022 Oral & TPAMI 2023] Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

Language:Python216 6 14

SparseOcc

[ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation Metric

Language:PythonApache-2.0196 5 38

MeMOTR

[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

Language:PythonMIT137 5 19

MixFormerV2

[NeurIPS 2023] MixFormerV2: Efficient Fully Transformer Tracking

Language:PythonMIT134 10 39

LinK

[CVPR 2023] LinK: Linear Kernel for LiDAR-based 3D Perception

Language:PythonMIT81 7 7

MOTIP

Multiple Object Tracking as ID Prediction

Language:PythonApache-2.072 6 24

SGM-VFI

[CVPR 2024] Sparse Global Matching for Video Frame Interpolation with Large Motion

Language:Python55 4 1

BIVDiff

[CVPR 2024] BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

Language:Python43 2 3

PointTAD

[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

Language:PythonApache-2.037 4 6

CoMAE

[AAAI 2023 Oral] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Language:Python31 2 4

VFIMamba

VFIMamba: Video Frame Interpolation with State Space Models

Language:PythonApache-2.027 10

DEQDet

[ICCV 2023] Deep Equilibrium Object Detection

Language:Jupyter Notebook21 3 2

EVAD

[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context Refinement

Language:PythonNOASSERTION20 2 4

MGMAE

[ICCV 2023] MGMAE: Motion Guided Masking for Video Masked Autoencoding

Language:PythonMIT19 2 2

SPLAM

[ECCV 2024 Oral] SPLAM: Accelerating Image Generation with Sub-path Linear Approximation Model

Language:PythonMIT1300

SportsHHI

[CVPR 2024] SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Language:Python1100

ZeroI2V

[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

Language:PythonApache-2.01100

AMD

[CVPR 2024] Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Language:Python10 2 1

Dynamic-MDETR

[TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding

Language:Python1000

StageInteractor

[ICCV 2023] StageInteractor: Query-based Object Detector with Cross-stage Interaction

Language:PythonApache-2.09 20

VLG

VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)

Language:Python8 10

DGN

[IJCV 2023] Dual Graph Networks for Pose Estimation in Crowded Scenes

Language:Python7 20

ViT-TAD

[CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

Language:Python700

VideoEval

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

Language:Python600

PRVG

[CVIU 2024] End-to-end dense video grounding via parallel regression

Language:Python500

LogN

[IJCV 2024] Logit Normalization for Long-Tail Object Detection

Language:PythonApache-2.04 10

ProVP

[IJCV] Progressive Visual Prompt Learning with Contrastive Feature Re-formation

Language:Python300