ICCV 2021 论文和开源项目合集(papers with code)!
1617 papers accepted - 25.9% acceptance rate
ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
- Backbone
- Transformer
- GAN
- NAS
- NeRF
- Loss
- Zero-Shot Learning
- Few-Shot Learning
- 长尾(Long-tailed)
- Vision and Language
- 无监督/自监督(Self-Supervised)
- Multi-Label Image Recognition(多标签图像识别)
- 2D目标检测(Object Detection)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 医学图像分割(Medical Image Segmentation)
- Few-shot Segmentation
- 人体运动分割(Human Motion Segmentation)
- 目标跟踪(Object Tracking)
- 3D Point Cloud
- 3D Object Detection(点云目标检测)
- 3D Semantic Segmenation(点云语义分割)
- 3D Instance Segmentation(点云实例分割)
- Point Cloud Denoising(点云去噪)
- Point Cloud Registration(点云配准)
- 超分辨率(Super-Resolution)
- 视频插帧(Video In)
- 行人重识别(Person Re-identification)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 3D人头重建(3D Head Reconstruction)
- 行为识别(Action Recognition)
- 时序动作定位(Temporal Action Localization)
- 文本检测(Text Detection)
- 文本识别(Text Recognition)
- 视觉问答(Visual Question Answering, VQA)
- 对抗攻击(Adversarial Attack)
- 深度估计(Depth Estimation)
- 视线估计(Gaze Estimation)
- 人群计数(Crowd Counting)
- 轨迹预测(Trajectory Prediction)
- 异常检测(Anomaly Detection)
- 场景图生成(Scene Graph Generation)
- 图像编辑(Image Editing)
- Unsupervised Domain Adaptation
- Video Rescaling
- Hand-Object Interaction
- 数据集(Datasets)
- 其他(Others)
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
AutoFormer: Searching Transformers for Visual Recognition
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
Vision Transformer with Progressive Sampling
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Rethinking Spatial Dimensions of Vision Transformers
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
Group-Free 3D Object Detection via Transformers
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
Rethinking and Improving Relative Position Encoding for Vision Transformer
Emerging Properties in Self-Supervised Vision Transformers
Learning Spatio-Temporal Transformer for Visual Tracking
Fast Convergence of DETR with Spatially Modulated Co-Attention
Vision Transformer with Progressive Sampling
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Rethinking Spatial Dimensions of Vision Transformers
Labels4Free: Unsupervised Segmentation using StyleGAN
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
EigenGAN: Layer-Wise Eigen-Learning for GANs
From Continuity to Editability: Inverting GANs with Consecutive Images
- Paper: https://arxiv.org/abs/2107.13812
- Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
Sketch Your Own GAN
AutoFormer: Searching Transformers for Visual Recognition
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
In-Place Scene Labelling and Understanding with Implicit Scene Representation
- Homepage: https://shuaifengzhi.com/Semantic-NeRF/
- Paper(Oral): https://arxiv.org/abs/2103.15875
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
- Homepage: https://ajayj.com/dietnerf
- Paper(DietNeRF): https://arxiv.org/abs/2104.00677
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
FREE: Feature Refinement for Generalized Zero-Shot Learning
Few-Shot and Continual Learning with Attentive Independent Mechanisms
Parametric Contrastive Learning
- Paper: https://arxiv.org/abs/2107.12028
- Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
VLGrammar: Grounded Grammar Induction of Vision and Language
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
DetCo: Unsupervised Contrastive Learning for Object Detection
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
- Paper: https://arxiv.org/abs/2108.02183
- Code: None
Residual Attention: A Simple but Effective Method for Multi-Label Recognition
- Paper: https://arxiv.org/abs/2108.02456
- Code: None
DetCo: Unsupervised Contrastive Learning for Object Detection
Detecting Invisible People
Active Learning for Deep Object Detection via Probabilistic Modeling
- Paper: https://arxiv.org/abs/2103.16130
- Code: None
Conditional Variational Capsule Network for Open Set Recognition
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
- Homepage: https://ashkamath.github.io/mdetr_page/
- Paper(Oral): https://arxiv.org/abs/2104.12763
- Code: https://github.com/ashkamath/mdetr
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
SimROD: A Simple Adaptation Method for Robust Object Detection
- Paper(Oral): https://arxiv.org/abs/2107.13389
- Code: None
GraphFPN: Graph Feature Pyramid Network for Object Detection
- Paper: https://arxiv.org/abs/2108.00580
- Code: None
Fast Convergence of DETR with Spatially Modulated Co-Attention
End-to-End Semi-Supervised Object Detection with Soft Teacher
- Paper: https://arxiv.org/abs/2106.09018
- Code: None
Personalized Image Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11264
- Code: None
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11787
- Code: None
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
- Paper(Oral): https://arxiv.org/abs/2107.11279
- Code: https://github.com/CVMI-Lab/DARS
Labels4Free: Unsupervised Segmentation using StyleGAN
Instances as Queries
Crossover Learning for Fast Online Video Instance Segmentation
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
Mining Latent Classes for Few-shot Segmentation
- Paper(Oral): https://arxiv.org/abs/2103.15402
- Code: https://github.com/LiheYoung/MiningFSS
Graph Constrained Data Representation Learning for Human Motion Segmentation
- Paper: https://arxiv.org/abs/2107.13362
- Code: None
Learning Spatio-Temporal Transformer for Visual Tracking
Learning to Adversarially Blur Visual Object Tracking
HiFT: Hierarchical Feature Transformer for Aerial Tracking
Learn to Match: Automatic Matching Network Design for Visual Tracking
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
- Homepage: https://hansen7.github.io/OcCo/
- Paper: https://arxiv.org/abs/2010.01089
- Code: https://github.com/hansen7/OcCo
Group-Free 3D Object Detection via Transformers
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11769
- Code: None
Learning with Noisy Labels for Robust Point Cloud Segmentation
- Homepage: https://shuquanye.com/PNAL_website/
- Paper(Oral): https://arxiv.org/abs/2107.14230
VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.13824
- Code: https://github.com/hzykent/VMNet
Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation
Hierarchical Aggregation for 3D Instance Segmentation
Score-Based Point Cloud Denoising
- Paper: https://arxiv.org/abs/2107.10981
- Code: None
HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
- Homepage: https://ispc-group.github.io/hregnet
- Paper: https://arxiv.org/abs/2107.11992
- Code: https://github.com/ispc-lab/HRegNet
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
XVFI: eXtreme Video Frame Interpolation
- Paper(Oral): https://arxiv.org/abs/2103.16206
- Code: https://github.com/JihyongOh/XVFI
- Dataset: https://github.com/JihyongOh/XVFI
TransReID: Transformer-based Object Re-Identification
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID
- Paper(Oral): https://arxiv.org/abs/2108.02413
- Code: https://github.com/SikaStar/IDM
Human Pose Regression with Residual Log-likelihood Estimation
- Paper(Oral): https://arxiv.org/abs/2107.11291
- Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
Online Knowledge Distillation for Efficient Pose Estimation
- Paper: https://arxiv.org/abs/2108.02092
- Code: None
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
- Paper: https://arxiv.org/abs/2107.13788
- Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
- Paper: https://arxiv.org/abs/2104.09952
- Code: None
Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
- Paper: https://arxiv.org/abs/2108.02183
- Code: None
Enriching Local and Global Contexts for Temporal Action Localization
- Paper: https://arxiv.org/abs/2107.12960
- Code: None
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
- Paper: https://arxiv.org/abs/2107.12090
- Code: None
Greedy Gradient Ensemble for Robust Visual Question Answering
Feature Importance-aware Transferable Adversarial Attacks
MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
- Paper: https://arxiv.org/abs/2107.12429
- Code: None
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
- Paper(Oral): https://arxiv.org/abs/2107.12746
- Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
- Paper: https://arxiv.org/abs/2107.12619
- Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
Human Trajectory Prediction via Counterfactual Analysis
Personalized Trajectory Prediction via Distribution Discrimination
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
Sketch Your Own GAN
Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation
- Paper(Oral): https://arxiv.org/abs/2107.13467
- Code: None
Self-Conditioned Probabilistic Learning of Video Rescaling
-
Code: None
Learning a Contact Potential Field to Model the Hand-Object Interaction
XVFI: eXtreme Video Frame Interpolation
- Paper(Oral): https://arxiv.org/abs/2103.16206
- Code: https://github.com/JihyongOh/XVFI
- Dataset: https://github.com/JihyongOh/XVFI
Personalized Image Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
Out-of-Core Surface Reconstruction via Global TGV Minimization
- Paper: https://arxiv.org/abs/2107.14790
- Code: None
Progressive Correspondence Pruning by Consensus Learning
- Homepage: https://sailor-z.github.io/projects/CLNet.html
- Paper: https://arxiv.org/abs/2101.00591
- Code: https://github.com/sailor-z/CLNet
项目主页:
Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
- Paper: https://arxiv.org/abs/2107.12628
- Code: None
Generalized Shuffled Linear Regression
- Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
- Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
Discovering 3D Parts from Image Collections
-
Homepage: https://chhankyao.github.io/lpd/
Semi-Supervised Active Learning with Temporal Output Discrepancy
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
Paper: https://arxiv.org/abs/2105.02498
Code: https://github.com/KingJamesSong/DifferentiableSVD
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
- Homepage: https://hwjiang1510.github.io/GraspTTA/
- Paper(Oral): https://arxiv.org/abs/2104.03304
- Code: None
Equivariant Imaging: Learning Beyond the Range Space
- Paper(Oral): https://arxiv.org/abs/2103.14756
- Code: https://github.com/edongdongchen/EI
Just Ask: Learning to Answer Questions from Millions of Narrated Videos
- Paper(Oral): https://arxiv.org/abs/2012.00451
- Code: https://github.com/antoyang/just-ask