video object detection paper

record some video object detection papers and dataset (视频目标检测论文整理)

some notes about papers:

note: If the image does not display, please check [link] or download the 'fig/' folder

state-of-the-art video object detectors performance comparison without post-processing methods.

state-of-the-art video object detectors performance comparison with post-processing methods. ∗ indicates use of video-level post-processing methods (e.g Seq-NMS, tubelet rescoring, BLR), △ indicates using data augmentation

Dataset

ImageNet: Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. "ImageNet Large Scale Visual Recognition Challenge". IJCV(2015).[paper] [download link]
Epic Kitchen: Dima Damen, Hazel Doughty, Giovanni Maria Farinella,Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. "Scaling egocentric vision: The epic-kitchens dataset". ECCV(2018).[paper] [download link]

IJCV 2021

CSMN: Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, Hao Li. "Context and Structure Mining Network for Video Object Detection". IJCV(2021).[paper][code]

ACM MM 2021

TransVOD: Lu He, Qianyu Zhou, Xiangtai Li, Li Niu1, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang. "End-to-End Video Object Detection with Spatial-Temporal Transformers". ACM MM(2021).[paper][code]
VmAP: Anupam Sobti, Vaibhav Mavi, M Balakrishnan, Chetan Arora. "VmAP: A Fair Metric for Video Object Detection". ACM MM(2021).[paper]

AAAI 2021

MAMBA: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson. "MAMBA:Multi-level Aggregation via Memory Bank for Video Object Detection". AAAI(2021).[paper]

ICCV 2021

TF-Blender: Yiming Cui, Liqi Yan, Zhiwen Cao, Dongfang Liu. "TF-Blender: Temporal Feature Blender for Video Object Detection". ICCV(2021).[paper][code]

ACM MM 2020

DSFNet: Lijian Lin, Haosheng Chen, Honglun Zhang, Jun Liang, Yu Li, Ying Shan, Hanzi Wang. "Dual Semantic Fusion Network for Video Object Detection". ACM MM(2020). [paper]
EBFA: Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang, Hao Li. "Exploiting Better Feature Aggregation for Video Object Detection.". ACM MM(2020). [paper]

ECCV 2020

LSTS: Jiang, Zhengkai and Liu, Yu and Yang, Ceyuan and Liu, Jihao and Gao, Peng and Zhang, Qian and Xiang, Shiming and Pan, Chunhong. "Learning Where to Focus for Efficient Video Object Detection". ECCV(2020). [paper] [code]
OLTA: Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang. "Video Object Detection via Object-level Temporal Aggregation". ECCV(2020). [paper]
HVRNet: Mingfei Han, Yali Wang, Xiaojun Chang, and Yu Qiao Mining. "Mining Inter-Video Proposal Relations for Video Object Detection". ECCV(2020). [paper] [code]
CHP: Zhujun Xu, Emir Hrustic, and DamienVivet. "CenterNet Heatmap Propagation for Real-time Video Object Detection". ECCV(2020). [paper]

CVPR 2020

MEGA: Yihong Chen, Yue Cao, Han Hu, Liwei Wang. "Memory Enhanced Global-Local Aggregation for Video Object Detection". CVPR(2020).[paper] [code]

AAAI 2020

TCENet: Fei He, Naiyu Gao, Qiaozhe Li, Senyao Du, Xin Zhao, Kaiqi Huang. "Temporal Context Enhanced Feature Aggregation for Video Object Detection". AAAI(2020).[paper]

ICCV 2019

RDN: Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, and Tao Mei. "Relation Distillation Networks for Video Object Detection". ICCV(2019).[paper]
SELSA: Haiping Wu, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang. "Sequence Level Semantics Aggregation for Video Object Detection". ICCV(2019).[paper] [code]
LLTR: Mykhailo Shvets, Wei Liu, Alexander C. Berg. "Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection". ICCV(2019).[paper]
OGEMN: Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, and Haibing Guan. "Object Guided External Memory Network for Video Object Detection". ICCV(2019).[paper]
PSLA: Chaoxu Guo, Bin Fan1, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, Chunhong Pan1. "Progressive Sparse Local Attention for Video Object Detection". ICCV(2019).[paper]
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell: Huizi Mao, Xiaodong Yang, William J. Dally. "A Delay Metric for Video Object Detection: What Average Precision Fails to Tell". ICCV(2019).[paper]

AAAI 2019

LWDN: Zhengkai Jiang, Peng Gao, Chaoxu Guo, Qian Zhang, Shiming Xiang, Chunhong Pan. "Video Object Detection with Locally-Weighted Deformable Neighbors". AAAI(2019).[paper]
DorT: Hao Luo, Wenxuan Xie, Xinggang Wang, Wenjun Zeng. "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking". AAAI(2019).[paper]

CVPR 2018

THP: Xizhou Zhu, Jifeng Dai, Lu Yuan, Yichen Wei. "Towards High Performance Video Object Detection". CVPR(2018).[paper]
LSTM-SSD: Mason Liu, Menglong Zhu. "Mobile Video Object Detection with Temporally-Aware Feature Maps". CVPR(2018).[paper]
ST-Lattice: Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Chang Loy, Dahua Lin. "Optimizing Video Object Detection via a Scale-Time Lattice". CVPR(2018).[paper]

ECCV 2018

STSN: Gedas Bertasius, Lorenzo Torresani, ianbo Shi. "Object Detection in Video with Spatiotemporal Sampling Networks". ECCV(2018).[paper]
STMN: Fanyi Xiao, Yong Jae Lee. "Video Object Detection with an Aligned Spatial-Temporal Memory". ECCV(2018).[paper] [code]
MANet: Shiyao Wang, Yucong Zhou, Junjie Yan, Zhidong Deng. "Fully Motion-Aware Network for Video Object Detection". ECCV(2018).[paper]

CVPR 2017

DFF: Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei. "Deep Feature Flow for Video Recognition". CVPR(2017).[paper] [code]

ICCV 2017

FGFA: Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei. "Flow-Guided Feature Aggregation for Video Object Detection". ICCV(2017).[paper] [code]
D&T: Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman. "Detect to Track and Track to Detect". ICCV(2017).[paper] [code]

Papers before 2017

T-cnn: Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang. " T-cnn: Tubelets with convolutional neural networks for object detection from videos". IEEE Transactions on Circuits and Systems for Video Technology(2017).[paper] [code]
Object detection from video tubelets with convolutional neural networks: Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang. "Object detection from video tubelets with convolutional neural networks". CVPR(2016).[paper] [code]
Seq-NMS: Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. "Seq-NMS for Video Object Detection". ArXiv(2016).[paper]

junliang230 / video_object_detection_paper

video object detection paper

some notes about papers:

note: If the image does not display, please check [link] or download the 'fig/' folder

state-of-the-art video object detectors performance comparison without post-processing methods.

state-of-the-art video object detectors performance comparison with post-processing methods. ∗ indicates use of video-level post-processing methods (e.g Seq-NMS, tubelet rescoring, BLR), △ indicates using data augmentation

Dataset

IJCV 2021

ACM MM 2021

AAAI 2021

ICCV 2021

ACM MM 2020

ECCV 2020

CVPR 2020

AAAI 2020

ICCV 2019

AAAI 2019

CVPR 2018

ECCV 2018

CVPR 2017

ICCV 2017

Papers before 2017

About