ICCV 2021 论文和开源项目合集(papers with code)!
1617 papers accepted - 25.9% acceptance rate
ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
- Backbone
- Transformer
- GAN
- NAS
- NeRF
- Loss
- 长尾(Long-tailed)
- 无监督/自监督(Self-Supervised)
- 2D目标检测(Object Detection)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- Few-shot Segmentation
- 目标跟踪(Object Tracking)
- 3D Point Cloud
- Point Cloud Denoising(点云语义分割)
- Point Cloud Denoising(点云去噪)
- Point Cloud Registration(点云配准)
- 超分辨率(Super-Resolution)
- 行人重识别(Person Re-identification)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 3D人头重建(3D Head Reconstruction)
- 行为识别(Action Recognition)
- 文本检测(Text Detection)
- 文本识别(Text Recognition)
- 深度估计(Depth Estimation)
- 人群计数(Crowd Counting)
- 异常检测(Anomaly Detection)
- 场景图生成(Scene Graph Generation)
- 数据集(Datasets)
- 其他(Others)
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
AutoFormer: Searching Transformers for Visual Recognition
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
Labels4Free: Unsupervised Segmentation using StyleGAN
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
EigenGAN: Layer-Wise Eigen-Learning for GANs
AutoFormer: Searching Transformers for Visual Recognition
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
In-Place Scene Labelling and Understanding with Implicit Scene Representation
- Homepage: https://shuaifengzhi.com/Semantic-NeRF/
- Paper(Oral): https://arxiv.org/abs/2103.15875
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
- Homepage: https://ajayj.com/dietnerf
- Paper(DietNeRF): https://arxiv.org/abs/2104.00677
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
Parametric Contrastive Learning
- Paper: https://arxiv.org/abs/2107.12028
- Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
DetCo: Unsupervised Contrastive Learning for Object Detection
DetCo: Unsupervised Contrastive Learning for Object Detection
Detecting Invisible People
Active Learning for Deep Object Detection via Probabilistic Modeling
- Paper: https://arxiv.org/abs/2103.16130
- Code: None
Conditional Variational Capsule Network for Open Set Recognition
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
- Homepage: https://ashkamath.github.io/mdetr_page/
- Paper(Oral): https://arxiv.org/abs/2104.12763
- Code: https://github.com/ashkamath/mdetr
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
SimROD: A Simple Adaptation Method for Robust Object Detection
-
Paper(Oral): https://arxiv.org/abs/2107.13389
-
Code: None
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11787
- Code: None
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
- Paper(Oral): https://arxiv.org/abs/2107.11279
- Code: https://github.com/CVMI-Lab/DARS
Labels4Free: Unsupervised Segmentation using StyleGAN
Instances as Queries
Crossover Learning for Fast Online Video Instance Segmentation
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
Mining Latent Classes for Few-shot Segmentation
- Paper(Oral): https://arxiv.org/abs/2103.15402
- Code: https://github.com/LiheYoung/MiningFSS
Learning to Adversarially Blur Visual Object Tracking
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
- Homepage: https://hansen7.github.io/OcCo/
- Paper: https://arxiv.org/abs/2010.01089
- Code: https://github.com/hansen7/OcCo
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11769
- Code: None
Score-Based Point Cloud Denoising
- Paper: https://arxiv.org/abs/2107.10981
- Code: None
HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
- Homepage: https://ispc-group.github.io/hregnet
- Paper: https://arxiv.org/abs/2107.11992
- Code: https://github.com/ispc-lab/HRegNet
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
TransReID: Transformer-based Object Re-Identification
Human Pose Regression with Residual Log-likelihood Estimation
- Paper(Oral): https://arxiv.org/abs/2107.11291
- Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
- Paper: https://arxiv.org/abs/2104.09952
- Code: None
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
- Paper: https://arxiv.org/abs/2107.12090
- Code: None
MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
- Paper: https://arxiv.org/abs/2107.12429
- Code: None
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
- Paper(Oral): https://arxiv.org/abs/2107.12746
- Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
- Homepage: https://hwjiang1510.github.io/GraspTTA/
- Paper(Oral): https://arxiv.org/abs/2104.03304
- Code: None
Equivariant Imaging: Learning Beyond the Range Space
- Paper(Oral): https://arxiv.org/abs/2103.14756
- Code: https://github.com/edongdongchen/EI
Just Ask: Learning to Answer Questions from Millions of Narrated Videos
- Paper(Oral): https://arxiv.org/abs/2012.00451
- Code: https://github.com/antoyang/just-ask