ICCV2021-Papers-with-Code

ICCV 2021 论文和开源项目合集(papers with code)！

1617 papers accepted - 25.9% acceptance rate

ICCV 2021 收录论文IDs：https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml

注1：欢迎各位大佬提交issue，分享ICCV 2021论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

【ICCV 2021 论文和开源目录】

Backbone
Transformer
GAN
NAS
NeRF
Loss
长尾(Long-tailed)
无监督/自监督(Self-Supervised)
2D目标检测(Object Detection)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
Few-shot Segmentation
目标跟踪(Object Tracking)
3D Point Cloud
Point Cloud Denoising(点云语义分割)
Point Cloud Denoising(点云去噪)
Point Cloud Registration(点云配准)
超分辨率(Super-Resolution)
行人重识别(Person Re-identification)
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
3D人头重建(3D Head Reconstruction)
行为识别(Action Recognition)
文本检测(Text Detection)
文本识别(Text Recognition)
深度估计(Depth Estimation)
人群计数(Crowd Counting)
异常检测(Anomaly Detection)
场景图生成(Scene Graph Generation)
数据集(Datasets)
其他(Others)

Backbone

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Paper(Oral): https://arxiv.org/abs/2102.12122
Code: https://github.com/whai362/PVT

AutoFormer: Searching Transformers for Visual Recognition

Paper: https://arxiv.org/abs/2107.00651
Code: https://github.com/microsoft/AutoML

Bias Loss for Mobile Neural Networks

Paper: https://arxiv.org/abs/2107.11170
Code: None

Visual Transformer

An Empirical Study of Training Self-Supervised Vision Transformers

Paper(Oral): https://arxiv.org/abs/2104.02057
MoCo v3 Code: None

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Paper(Oral): https://arxiv.org/abs/2102.12122
Code: https://github.com/whai362/PVT

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Paper: https://arxiv.org/abs/2107.12309
Code: None

GAN

Labels4Free: Unsupervised Segmentation using StyleGAN

Homepage: https://rameenabdal.github.io/Labels4Free/
Paper: https://arxiv.org/abs/2103.14968

GNeRF: GAN-based Neural Radiance Field without Posed Camera

Paper(Oral): https://arxiv.org/abs/2103.15606
Code: https://github.com/MQ66/gnerf

EigenGAN: Layer-Wise Eigen-Learning for GANs

Paper: https://arxiv.org/abs/2104.12476
Code: https://github.com/LynnHo/EigenGAN-Tensorflow

NAS

AutoFormer: Searching Transformers for Visual Recognition

Paper: https://arxiv.org/abs/2107.00651
Code: https://github.com/microsoft/AutoML

NeRF

GNeRF: GAN-based Neural Radiance Field without Posed Camera

Paper(Oral): https://arxiv.org/abs/2103.15606
Code: https://github.com/MQ66/gnerf

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

Paper: https://arxiv.org/abs/2103.13744
Code: https://github.com/creiser/kilonerf

In-Place Scene Labelling and Understanding with Implicit Scene Representation

Homepage: https://shuaifengzhi.com/Semantic-NeRF/
Paper(Oral): https://arxiv.org/abs/2103.15875

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Homepage: https://ajayj.com/dietnerf
Paper(DietNeRF): https://arxiv.org/abs/2104.00677

Loss

Rank & Sort Loss for Object Detection and Instance Segmentation

Paper(Oral): https://arxiv.org/abs/2107.11669
Code: https://github.com/kemaloksuz/RankSortLoss

Bias Loss for Mobile Neural Networks

Paper: https://arxiv.org/abs/2107.11170
Code: None

长尾(Long-tailed)

Parametric Contrastive Learning

无监督/自监督(Un/Self-Supervised)

An Empirical Study of Training Self-Supervised Vision Transformers

Paper(Oral): https://arxiv.org/abs/2104.02057
MoCo v3 Code: None

DetCo: Unsupervised Contrastive Learning for Object Detection

Paper: https://arxiv.org/abs/2102.04803
Code: https://github.com/xieenze/DetCo

2D目标检测(Object Detection)

DetCo: Unsupervised Contrastive Learning for Object Detection

Paper: https://arxiv.org/abs/2102.04803
Code: https://github.com/xieenze/DetCo

Detecting Invisible People

Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm
Code: https://arxiv.org/abs/2012.08419

Active Learning for Deep Object Detection via Probabilistic Modeling

Paper: https://arxiv.org/abs/2103.16130
Code: None

Conditional Variational Capsule Network for Open Set Recognition

Paper: https://arxiv.org/abs/2104.09159
Code: https://github.com/guglielmocamporese/cvaecaposr

MDETR : Modulated Detection for End-to-End Multi-Modal Understanding

Homepage: https://ashkamath.github.io/mdetr_page/
Paper(Oral): https://arxiv.org/abs/2104.12763
Code: https://github.com/ashkamath/mdetr

Rank & Sort Loss for Object Detection and Instance Segmentation

Paper(Oral): https://arxiv.org/abs/2107.11669
Code: https://github.com/kemaloksuz/RankSortLoss

SimROD: A Simple Adaptation Method for Robust Object Detection

Paper(Oral): https://arxiv.org/abs/2107.13389
Code: None

语义分割(Semantic Segmentation)

半监督语义分割(Semi-supervised Semantic Segmentation)

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2107.11787
Code: None

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation

Paper(Oral): https://arxiv.org/abs/2107.11279
Code: https://github.com/CVMI-Lab/DARS

无监督分割(Unsupervised Segmentation)

Labels4Free: Unsupervised Segmentation using StyleGAN

Homepage: https://rameenabdal.github.io/Labels4Free/
Paper: https://arxiv.org/abs/2103.14968

实例分割(Instance Segmentation)

Instances as Queries

Paper: https://arxiv.org/abs/2105.01928
Code: https://github.com/hustvl/QueryInst

Crossover Learning for Fast Online Video Instance Segmentation

Paper: https://arxiv.org/abs/2104.05970
Code: https://github.com/hustvl/CrossVIS

Rank & Sort Loss for Object Detection and Instance Segmentation

Paper(Oral): https://arxiv.org/abs/2107.11669
Code: https://github.com/kemaloksuz/RankSortLoss

Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation

Paper(Oral): https://arxiv.org/abs/2103.15402
Code: https://github.com/LiheYoung/MiningFSS

目标跟踪(Object Tracking)

Learning to Adversarially Blur Visual Object Tracking

Paper: https://arxiv.org/abs/2107.12085
Code: https://github.com/tsingqguo/ABA

3D Point Cloud

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

Homepage: https://hansen7.github.io/OcCo/
Paper: https://arxiv.org/abs/2010.01089
Code: https://github.com/hansen7/OcCo

Point Cloud Semantic Segmentation(点云语义分割)

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

Paper: https://arxiv.org/abs/2107.11769
Code: None

Point Cloud Denoising(点云去噪)

Score-Based Point Cloud Denoising

Paper: https://arxiv.org/abs/2107.10981
Code: None

Point Cloud Registration(点云配准)

HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

Homepage: https://ispc-group.github.io/hregnet
Paper: https://arxiv.org/abs/2107.11992
Code: https://github.com/ispc-lab/HRegNet

超分辨率(Super-Resolution)

Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks

Paper: https://arxiv.org/abs/2004.03791
Code: https://github.com/LongguangWang/ArbSR

行人重识别(Person Re-identification)

TransReID: Transformer-based Object Re-Identification

Paper: https://arxiv.org/abs/2102.04378
Code: https://github.com/heshuting555/TransReID

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

Human Pose Regression with Residual Log-likelihood Estimation

Paper(Oral): https://arxiv.org/abs/2107.11291
Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression

3D人头重建(3D Head Reconstruction)

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Homepage: https://crisalixsa.github.io/h3d-net/
Paper: https://arxiv.org/abs/2107.12512

行为识别(Action Recognition)

MGSampler: An Explainable Sampling Strategy for Video Action Recognition

Paper: https://arxiv.org/abs/2104.09952
Code: None

文本检测(Text Detection)

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

Paper: https://arxiv.org/abs/2107.12664
Code: https://github.com/GXYM/TextBPN

文本识别(Text Recognition)

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

Paper: https://arxiv.org/abs/2107.12090
Code: None

深度估计(Depth Estimation)

单目深度估计

MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

Paper: https://arxiv.org/abs/2107.12429
Code: None

人群计数(Crowd Counting)

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework

Paper(Oral): https://arxiv.org/abs/2107.12746
Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet

异常检测(Anomaly Detection)

Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

Paper: https://arxiv.org/abs/2101.10030
Code: https://github.com/tianyu0207/RTFM

场景图生成(Scene Graph Generation)

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Paper: https://arxiv.org/abs/2107.12309
Code: None

数据集(Datasets)

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Homepage: https://crisalixsa.github.io/h3d-net/
Paper: https://arxiv.org/abs/2107.12512

其他(Others)

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

Homepage: https://hwjiang1510.github.io/GraspTTA/
Paper(Oral): https://arxiv.org/abs/2104.03304
Code: None

Equivariant Imaging: Learning Beyond the Range Space

Paper(Oral): https://arxiv.org/abs/2103.14756
Code: https://github.com/edongdongchen/EI

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Paper(Oral): https://arxiv.org/abs/2012.00451
Code: https://github.com/antoyang/just-ask

yinhefeng / ICCV2021-Papers-with-Code

ICCV2021-Papers-with-Code

【ICCV 2021 论文和开源目录】

Backbone

Visual Transformer

GAN

NAS

NeRF

Loss

长尾(Long-tailed)

无监督/自监督(Un/Self-Supervised)

2D目标检测(Object Detection)

语义分割(Semantic Segmentation)

半监督语义分割(Semi-supervised Semantic Segmentation)

无监督分割(Unsupervised Segmentation)

实例分割(Instance Segmentation)

Few-shot Segmentation

目标跟踪(Object Tracking)

3D Point Cloud

Point Cloud Semantic Segmentation(点云语义分割)

Point Cloud Denoising(点云去噪)

Point Cloud Registration(点云配准)

超分辨率(Super-Resolution)

行人重识别(Person Re-identification)

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

3D人头重建(3D Head Reconstruction)

行为识别(Action Recognition)

文本检测(Text Detection)

文本识别(Text Recognition)

深度估计(Depth Estimation)

单目深度估计

人群计数(Crowd Counting)

异常检测(Anomaly Detection)

场景图生成(Scene Graph Generation)

数据集(Datasets)

其他(Others)

About