CVPR 2022 论文和开源项目合集(Papers with Code)
CVPR 2022 论文和开源项目合集(papers with code)!
CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
【CVPR 2022 论文开源目录】
- Backbone
- CLIP
- NAS
- NeRF
- Visual Transformer
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D目标跟踪(3D Object Tracking)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 车道线检测(Lane Detection)
- 图像修复(Image Inpainting)
- 人群计数(Crowd Counting)
- 医学图像(Medical Image)
- 场景图生成(Scene Graph Generation)
- 水印(Watermarking)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
Backbone
A ConvNet for the 2020s
MPViT : Multi-Path Vision Transformer for Dense Prediction
CLIP
HairCLIP: Design Your Hair by Text and Reference Image
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
NAS
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
NeRF
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
-
Homepage: https://jonbarron.info/mipnerf360/
Point-NeRF: Point-based Neural Radiance Fields
- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
- Paper: https://arxiv.org/abs/2201.08845
- Code: https://github.com/Xharlie/point-nerf
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
-
Homepage: https://bmild.github.io/rawnerf/
Visual Transformer
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
应用
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
Embracing Single Stride 3D Object Detector with Sparse Transformer
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
自监督学习(Self-supervised Learning)
Crafting Better Contrastive Views for Siamese Representation Learning
数据增强(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
AlignMix: Improving representation by interpolating aligned features
- Paper: https://arxiv.org/abs/2103.15375
- Code: None
目标检测(Object Detection)
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Localization Distillation for Dense Object Detection
- Paper: https://arxiv.org/abs/2102.12252
- Code: https://github.com/HikariTJU/LD
- Code2: https://github.com/HikariTJU/LD
- 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
Focal and Global Knowledge Distillation for Detectors
目标跟踪(Visual Tracking)
Correlation-Aware Deep Tracking
- Paper: https://arxiv.org/abs/2203.01666
- Code: None
TCTrack: Temporal Contexts for Aerial Tracking
语义分割(Semantic Segmentation)
弱监督语义分割
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
半监督语义分割
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2106.05095
- Code: https://github.com/LiheYoung/ST-PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
实例分割(Instance Segmentation)
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
- Paper: https://arxiv.org/abs/2202.12181
- Code: None
视频实例分割
Efficient Video Instance Segmentation via Tracklet Query and Proposal
- Homepage: https://jialianwu.com/projects/EfficientVIS.html
- Paper: https://arxiv.org/abs/2203.01853
- Demo: https://youtu.be/sSPMzgtMKCE
图像编辑(Image Editing)
Blended Diffusion for Text-driven Editing of Natural Images
Low-level Vision
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
超分辨率(Super-Resolution)
视频超分辨率
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
- Paper: https://arxiv.org/abs/2104.13371
- Code: https://github.com/open-mmlab/mmediting
- Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
3D点云(3D Point Cloud)
A Unified Query-based Paradigm for Point Cloud Understanding
- Paper: https://arxiv.org/abs/2203.01252
- Code: None
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
PointCLIP: Point Cloud Understanding by CLIP
3D目标检测(3D Object Detection)
Embracing Single Stride 3D Object Detector with Sparse Transformer
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
3D目标跟踪(3D Object Tracking)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
3D人体姿态估计(3D Human Pose Estimation)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
3D语义场景补全(3D Semantic Scene Completion)
MonoScene: Monocular 3D Semantic Scene Completion
3D重建(3D Reconstruction)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
- Homepage: https://banmo-www.github.io/
- Paper: https://arxiv.org/abs/2112.12761
- Code: https://github.com/facebookresearch/banmo
- 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
深度估计(Depth Estimation)
单目深度估计
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
- Paper: https://arxiv.org/abs/2203.01502
- Code: None
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- Paper: https://arxiv.org/abs/2203.00838
- Code: None
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
车道线检测(Lane Detection)
Rethinking Efficient Lane Detection via Curve Modeling
图像修复(Image Inpainting)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
人群计数(Crowd Counting)
Leveraging Self-Supervision for Cross-Domain Crowd Counting
- Paper: https://arxiv.org/abs/2103.16291
- Code: None
医学图像(Medical Image)
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
- Paper: https://arxiv.org/abs/2203.02533
- Code: None
场景图生成(Scene Graph Generation)
SGTR: End-to-end Scene Graph Generation with Transformer
- Paper: https://arxiv.org/abs/2112.12970
- Code: None
风格迁移(Style Transfer)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
-
Homepage: https://lukashoel.github.io/stylemesh/
水印(Watermarking)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
- Paper: https://arxiv.org/abs/2104.13450
- Code: None
数据集(Datasets)
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
Kubric: A scalable dataset generator
新任务(New Task)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
It's About Time: Analog Clock Reading in the Wild
-
Homepage: https://charigyang.github.io/abouttime/
其他(Others)
Kubric: A scalable dataset generator