ICCV2021-Papers-with-Code
ICCV 2021 论文和开源项目合集(papers with code)!
1617 papers accepted - 25.9% acceptance rate
ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
【ICCV 2021 论文和开源目录】
- Backbone
- Transformer
- GAN
- NAS
- NeRF
- Loss
- Zero-Shot Learning
- Few-Shot Learning
- 长尾(Long-tailed)
- 无监督/自监督(Self-Supervised)
- 2D目标检测(Object Detection)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 医学图像分割(Medical Image Segmentation)
- Few-shot Segmentation
- 人体运动分割(Human Motion Segmentation)
- 目标跟踪(Object Tracking)
- 3D Point Cloud
- Point Cloud Object Detection(点云目标检测)
- Point Cloud Semantic Segmenation(点云语义分割)
- Point Cloud Denoising(点云去噪)
- Point Cloud Registration(点云配准)
- 超分辨率(Super-Resolution)
- 行人重识别(Person Re-identification)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 3D人头重建(3D Head Reconstruction)
- 行为识别(Action Recognition)
- 时序动作定位(Temporal Action Localization)
- 文本检测(Text Detection)
- 文本识别(Text Recognition)
- 视觉问答(Visual Question Answering, VQA)
- 对抗攻击(Adversarial Attack)
- 深度估计(Depth Estimation)
- 视线估计(Gaze Estimation)
- 人群计数(Crowd Counting)
- 轨迹预测(Trajectory Prediction)
- 异常检测(Anomaly Detection)
- 场景图生成(Scene Graph Generation)
- Unsupervised Domain Adaptation
- Video Rescaling
- Hand-Object Interaction
- 数据集(Datasets)
- 其他(Others)
Backbone
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
AutoFormer: Searching Transformers for Visual Recognition
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
Visual Transformer
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
- Paper(Oral): https://arxiv.org/abs/2102.12122
- Code: https://github.com/whai362/PVT
Group-Free 3D Object Detection via Transformers
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
Rethinking and Improving Relative Position Encoding for Vision Transformer
- Paper: https://arxiv.org/abs/2107.14222
- Code: None
Emerging Properties in Self-Supervised Vision Transformers
GAN
Labels4Free: Unsupervised Segmentation using StyleGAN
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
EigenGAN: Layer-Wise Eigen-Learning for GANs
From Continuity to Editability: Inverting GANs with Consecutive Images
- Paper: https://arxiv.org/abs/2107.13812
- Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
NAS
AutoFormer: Searching Transformers for Visual Recognition
NeRF
GNeRF: GAN-based Neural Radiance Field without Posed Camera
-
Paper(Oral): https://arxiv.org/abs/2103.15606
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
In-Place Scene Labelling and Understanding with Implicit Scene Representation
- Homepage: https://shuaifengzhi.com/Semantic-NeRF/
- Paper(Oral): https://arxiv.org/abs/2103.15875
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
- Homepage: https://ajayj.com/dietnerf
- Paper(DietNeRF): https://arxiv.org/abs/2104.00677
Loss
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
Bias Loss for Mobile Neural Networks
- Paper: https://arxiv.org/abs/2107.11170
- Code: None
Zero-Shot Learning
FREE: Feature Refinement for Generalized Zero-Shot Learning
Few-Shot Learning
Few-Shot and Continual Learning with Attentive Independent Mechanisms
长尾(Long-tailed)
Parametric Contrastive Learning
- Paper: https://arxiv.org/abs/2107.12028
- Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
无监督/自监督(Un/Self-Supervised)
An Empirical Study of Training Self-Supervised Vision Transformers
- Paper(Oral): https://arxiv.org/abs/2104.02057
- MoCo v3 Code: None
DetCo: Unsupervised Contrastive Learning for Object Detection
2D目标检测(Object Detection)
DetCo: Unsupervised Contrastive Learning for Object Detection
Detecting Invisible People
Active Learning for Deep Object Detection via Probabilistic Modeling
- Paper: https://arxiv.org/abs/2103.16130
- Code: None
Conditional Variational Capsule Network for Open Set Recognition
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
- Homepage: https://ashkamath.github.io/mdetr_page/
- Paper(Oral): https://arxiv.org/abs/2104.12763
- Code: https://github.com/ashkamath/mdetr
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
SimROD: A Simple Adaptation Method for Robust Object Detection
- Paper(Oral): https://arxiv.org/abs/2107.13389
- Code: None
GraphFPN: Graph Feature Pyramid Network for Object Detection
- Paper: https://arxiv.org/abs/2108.00580
- Code: None
半监督目标检测
End-to-End Semi-Supervised Object Detection with Soft Teacher
- Paper: https://arxiv.org/abs/2106.09018
- Code: None
语义分割(Semantic Segmentation)
Personalized Image Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11264
- Code: None
半监督语义分割(Semi-supervised Semantic Segmentation)
Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11787
- Code: None
Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation
- Paper(Oral): https://arxiv.org/abs/2107.11279
- Code: https://github.com/CVMI-Lab/DARS
无监督分割(Unsupervised Segmentation)
Labels4Free: Unsupervised Segmentation using StyleGAN
实例分割(Instance Segmentation)
Instances as Queries
Crossover Learning for Fast Online Video Instance Segmentation
Rank & Sort Loss for Object Detection and Instance Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.11669
- Code: https://github.com/kemaloksuz/RankSortLoss
医学图像分割(Medical Image Segmentation)
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
Few-shot Segmentation
Mining Latent Classes for Few-shot Segmentation
- Paper(Oral): https://arxiv.org/abs/2103.15402
- Code: https://github.com/LiheYoung/MiningFSS
人体运动分割(Human Motion Segmentation)
Graph Constrained Data Representation Learning for Human Motion Segmentation
- Paper: https://arxiv.org/abs/2107.13362
- Code: None
目标跟踪(Object Tracking)
Learning Spatio-Temporal Transformer for Visual Tracking
Learning to Adversarially Blur Visual Object Tracking
HiFT: Hierarchical Feature Transformer for Aerial Tracking
Learn to Match: Automatic Matching Network Design for Visual Tracking
3D Point Cloud
Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion
- Homepage: https://hansen7.github.io/OcCo/
- Paper: https://arxiv.org/abs/2010.01089
- Code: https://github.com/hansen7/OcCo
Point Cloud Object Detection(点云目标检测)
Group-Free 3D Object Detection via Transformers
- Paper: https://arxiv.org/abs/2104.00678
- Code: None
Point Cloud Semantic Segmentation(点云语义分割)
ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.11769
- Code: None
Learning with Noisy Labels for Robust Point Cloud Segmentation
- Homepage: https://shuquanye.com/PNAL_website/
- Paper(Oral): https://arxiv.org/abs/2107.14230
VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
- Paper(Oral): https://arxiv.org/abs/2107.13824
- Code: https://github.com/hzykent/VMNet
Point Cloud Denoising(点云去噪)
Score-Based Point Cloud Denoising
- Paper: https://arxiv.org/abs/2107.10981
- Code: None
Point Cloud Registration(点云配准)
HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
- Homepage: https://ispc-group.github.io/hregnet
- Paper: https://arxiv.org/abs/2107.11992
- Code: https://github.com/ispc-lab/HRegNet
超分辨率(Super-Resolution)
Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
行人重识别(Person Re-identification)
TransReID: Transformer-based Object Re-Identification
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
2D 人体姿态估计
Human Pose Regression with Residual Log-likelihood Estimation
- Paper(Oral): https://arxiv.org/abs/2107.11291
- Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
3D 人体姿态估计
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
- Paper: https://arxiv.org/abs/2107.13788
- Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
3D人头重建(3D Head Reconstruction)
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
行为识别(Action Recognition)
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
- Paper: https://arxiv.org/abs/2104.09952
- Code: None
时序动作定位(Temporal Action Localization)
Enriching Local and Global Contexts for Temporal Action Localization
- Paper: https://arxiv.org/abs/2107.12960
- Code: None
文本检测(Text Detection)
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
文本识别(Text Recognition)
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
- Paper: https://arxiv.org/abs/2107.12090
- Code: None
视觉问答(Visual Question Answering, VQA)
Greedy Gradient Ensemble for Robust Visual Question Answering
对抗攻击(Adversarial Attack)
Feature Importance-aware Transferable Adversarial Attacks
深度估计(Depth Estimation)
单目深度估计
MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
- Paper: https://arxiv.org/abs/2107.12429
- Code: None
视线估计(Gaze Estimation)
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
人群计数(Crowd Counting)
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework
- Paper(Oral): https://arxiv.org/abs/2107.12746
- Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting
- Paper: https://arxiv.org/abs/2107.12619
- Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
轨迹预测(Trajectory Prediction)
Human Trajectory Prediction via Counterfactual Analysis
Personalized Trajectory Prediction via Distribution Discrimination
异常检测(Anomaly Detection)
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
场景图生成(Scene Graph Generation)
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
- Paper: https://arxiv.org/abs/2107.12309
- Code: None
Unsupervised Domain Adaptation
Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation
- Paper(Oral): https://arxiv.org/abs/2107.13467
- Code: None
Video Rescaling
Self-Conditioned Probabilistic Learning of Video Rescaling
-
Code: None
Hand-Object Interaction
Learning a Contact Potential Field to Model the Hand-Object Interaction
数据集(Datasets)
Personalized Image Semantic Segmentation
- Paper: https://arxiv.org/abs/2107.13978
- Code: https://github.com/zhangyuygss/PIS
- Dataset: https://github.com/zhangyuygss/PIS
H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction
-
Homepage: https://crisalixsa.github.io/h3d-net/
其他(Others)
Progressive Correspondence Pruning by Consensus Learning
- Homepage: https://sailor-z.github.io/projects/CLNet.html
- Paper: https://arxiv.org/abs/2101.00591
- Code: https://github.com/sailor-z/CLNet
项目主页:
Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
- Paper: https://arxiv.org/abs/2107.12628
- Code: None
Generalized Shuffled Linear Regression
- Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
- Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
Discovering 3D Parts from Image Collections
-
Homepage: https://chhankyao.github.io/lpd/
Semi-Supervised Active Learning with Temporal Output Discrepancy
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
Paper: https://arxiv.org/abs/2105.02498
Code: https://github.com/KingJamesSong/DifferentiableSVD
Hand-Object Contact Consistency Reasoning for Human Grasps Generation
- Homepage: https://hwjiang1510.github.io/GraspTTA/
- Paper(Oral): https://arxiv.org/abs/2104.03304
- Code: None
Equivariant Imaging: Learning Beyond the Range Space
- Paper(Oral): https://arxiv.org/abs/2103.14756
- Code: https://github.com/edongdongchen/EI
Just Ask: Learning to Answer Questions from Millions of Narrated Videos
- Paper(Oral): https://arxiv.org/abs/2012.00451
- Code: https://github.com/antoyang/just-ask