CVPR 2022 论文和开源项目合集(Papers with Code)

CVPR 2022 论文和开源项目合集(papers with code)！

CVPR 2022 收录列表ID：https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view

注1：欢迎各位大佬提交issue，分享CVPR 2022论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

CVPR 2019

CVPR 2020

CVPR 2021

如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~

【CVPR 2022 论文开源目录】

Backbone
CLIP
NAS
NeRF
Visual Transformer
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D目标跟踪(3D Object Tracking)
3D人体姿态估计(3D Human Pose Estimation)
3D语义场景补全(3D Semantic Scene Completion)
3D重建(3D Reconstruction)
深度估计(Depth Estimation)
车道线检测(Lane Detection)
图像修复(Image Inpainting)
人群计数(Crowd Counting)
医学图像(Medical Image)
场景图生成(Scene Graph Generation)
水印(Watermarking)
数据集(Datasets)
新任务(New Tasks)
其他(Others)

Backbone

A ConvNet for the 2020s

Paper: https://arxiv.org/abs/2201.03545
Code: https://github.com/facebookresearch/ConvNeXt
中文解读：https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw

MPViT : Multi-Path Vision Transformer for Dense Prediction

Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT

CLIP

HairCLIP: Design Your Hair by Text and Reference Image

Paper: https://arxiv.org/abs/2112.05142
Code: https://github.com/wty-ustc/HairCLIP

PointCLIP: Point Cloud Understanding by CLIP

Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP

Blended Diffusion for Text-driven Editing of Natural Images

Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion

NAS

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Paper: https://arxiv.org/abs/2111.15362
Code: None

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Homepage: https://jonbarron.info/mipnerf360/
Paper: https://arxiv.org/abs/2111.12077
Demo: https://youtu.be/YStDS2-Ln1s

Point-NeRF: Point-based Neural Radiance Fields

Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
Paper: https://arxiv.org/abs/2201.08845
Code: https://github.com/Xharlie/point-nerf

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Paper: https://arxiv.org/abs/2111.13679
Homepage: https://bmild.github.io/rawnerf/
Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc

Visual Transformer

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction

Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT

应用

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Paper: https://arxiv.org/abs/2104.01122
Code: None

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Paper: https://arxiv.org/abs/2203.00859
Code: None

Embracing Single Stride 3D Object Detector with Sparse Transformer

Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer

自监督学习(Self-supervised Learning)

Crafting Better Contrastive Views for Siamese Representation Learning

Paper: https://arxiv.org/abs/2202.03278
Code: https://github.com/xyupeng/ContrastiveCrop
中文解读：https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A

数据增强(Data Augmentation)

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

Paper: https://arxiv.org/abs/2202.12513
Code: https://github.com/DensoITLab/TeachAugment

AlignMix: Improving representation by interpolating aligned features

Paper: https://arxiv.org/abs/2103.15375
Code: None

目标检测(Object Detection)

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR

Localization Distillation for Dense Object Detection

Paper: https://arxiv.org/abs/2102.12252
Code: https://github.com/HikariTJU/LD
Code2: https://github.com/HikariTJU/LD
中文解读：https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg

Focal and Global Knowledge Distillation for Detectors

Paper: https://arxiv.org/abs/2111.11837
Code: https://github.com/yzd-v/FGD
中文解读：https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ

目标跟踪(Visual Tracking)

Correlation-Aware Deep Tracking

Paper: https://arxiv.org/abs/2203.01666
Code: None

TCTrack: Temporal Contexts for Aerial Tracking

Paper: https://arxiv.org/abs/2203.01885
Code: https://github.com/vision4robotics/TCTrack

语义分割(Semantic Segmentation)

弱监督语义分割

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2203.00962
Code: https://github.com/zhaozhengChen/ReCAM

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer

半监督语义分割

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2106.05095
Code: https://github.com/LiheYoung/ST-PlusPlus
中文解读：https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA

实例分割(Instance Segmentation)

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Paper: https://arxiv.org/abs/2203.04074
Code: https://github.com/zhang-tao-whu/e2ec

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations

Paper: https://arxiv.org/abs/2202.12181
Code: None

视频实例分割

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Homepage: https://jialianwu.com/projects/EfficientVIS.html
Paper: https://arxiv.org/abs/2203.01853
Demo: https://youtu.be/sSPMzgtMKCE

图像编辑(Image Editing)

Blended Diffusion for Text-driven Editing of Natural Images

Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Paper: https://arxiv.org/abs/2111.15362
Code: None

超分辨率(Super-Resolution)

视频超分辨率

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Paper: https://arxiv.org/abs/2104.13371
Code: https://github.com/open-mmlab/mmediting
Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
中文解读：https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g

3D点云(3D Point Cloud)

A Unified Query-based Paradigm for Point Cloud Understanding

Paper: https://arxiv.org/abs/2203.01252
Code: None

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

Paper: https://arxiv.org/abs/2203.00680
Code: https://github.com/MohamedAfham/CrossPoint

PointCLIP: Point Cloud Understanding by CLIP

Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP

3D目标检测(3D Object Detection)

Embracing Single Stride 3D Object Detector with Sparse Transformer

Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Paper: https://arxiv.org/abs/2011.12001
Code: https://github.com/qq456cvb/CanonicalVoting

3D目标跟踪(3D Object Tracking)

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Paper: https://arxiv.org/abs/2203.01730
Code: https://github.com/Ghostish/Open3DSOT

3D人体姿态估计(3D Human Pose Estimation)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Paper: https://arxiv.org/abs/2111.12707
Code: https://github.com/Vegetebird/MHFormer
中文解读: https://zhuanlan.zhihu.com/p/439459426

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Paper: https://arxiv.org/abs/2203.00859
Code: None

3D语义场景补全(3D Semantic Scene Completion)

MonoScene: Monocular 3D Semantic Scene Completion

Paper: https://arxiv.org/abs/2112.00726
Code: https://github.com/cv-rits/MonoScene

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

Homepage: https://banmo-www.github.io/
Paper: https://arxiv.org/abs/2112.12761
Code: https://github.com/facebookresearch/banmo
中文解读：https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew

深度估计(Depth Estimation)

单目深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Paper: https://arxiv.org/abs/2203.01502
Code: None

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

Paper: https://arxiv.org/abs/2203.00838
Code: None

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Paper: https://arxiv.org/abs/2112.02306
Code: None

车道线检测(Lane Detection)

Rethinking Efficient Lane Detection via Curve Modeling

图像修复(Image Inpainting)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

Paper: https://arxiv.org/abs/2203.00867
Code: https://github.com/DQiaole/ZITS_inpainting

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting

Paper: https://arxiv.org/abs/2103.16291
Code: None

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

Paper: https://arxiv.org/abs/2203.02533
Code: None

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer

Paper: https://arxiv.org/abs/2112.12970
Code: None

风格迁移(Style Transfer)

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

Homepage: https://lukashoel.github.io/stylemesh/
Paper: https://arxiv.org/abs/2112.01530
Code: https://github.com/lukasHoel/stylemesh
Demo：https://www.youtube.com/watch?v=ZqgiTLcNcks

水印(Watermarking)

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

Paper: https://arxiv.org/abs/2104.13450
Code: None

数据集(Datasets)

It's About Time: Analog Clock Reading in the Wild

Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Paper: https://arxiv.org/abs/2112.02306
Code: None

Kubric: A scalable dataset generator

Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric

新任务(New Task)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Paper: https://arxiv.org/abs/2104.01122
Code: None

It's About Time: Analog Clock Reading in the Wild

Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc

其他(Others)

Kubric: A scalable dataset generator

Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric