A154609 / ICCV2021-Papers-with-Code-Demo

ICCV 2021 paper with code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ICCV2021-Papers-with-Code-Demo

☪️论文下载:

密码:aicv

CVPR 2021整理:https://github.com/DWCTOD/CVPR2021-Papers-with-Code-Demo

论文下载:https://pan.baidu.com/share/init?surl=gjfUQlPf73MCk4vM8VbzoA

密码:aicv

🌟 ICCV 2021持续更新最新论文/paper和相应的开源代码/code!

🚗 ICCV 2021 收录列表:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml

🚗 官网链接:http://iccv2021.thecvf.com/home

⏲️ 时间 ⌚ 论文/paper接收公布时间:2021年7月23日

✋ ​注:欢迎各位大佬提交issue,分享ICCV 2021论文/paper和开源项目!共同完善这个项目

✈️ 为了方便下载,已将论文/paper存储在文件夹中 ✔️ 表示论文/paper已下载 / Paper Download

🎆 欢迎进群 | Welcome

ICCV 2021 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:ICCV+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。

🔨 目录 |Table of Contents(点击直接跳转)

Backbone

✔️Conformer: Local Features Coupling Global Representations for Visual Recognition

Reg-IBP: Efficient and Scalable Neural Network Robustness Training via Interval Bound Propagation

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

返回目录/back

Dataset

✔️FineAction: A Fined Video Dataset for Temporal Action Localization

✔️MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

返回目录/back

Loss

Bias Loss for Mobile Neural Networks

Focal Frequency Loss for Image Reconstruction and Synthesis

Orthogonal Projection Loss

Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)

返回目录/back

Vision Transformer

AutoFormer: Searching Transformers for Visual Recognition

HiFT: Hierarchical Feature Transformer for Aerial Tracking

High-Fidelity Pluralistic Image Completion with Transformers

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (Oral)

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Rethinking and Improving Relative Position Encoding for Vision Transformer

Rethinking Spatial Dimensions of Vision Transformers

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

✔️Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

✔️Visual Transformer with Statistical Test for COVID-19 Classification

Visual Saliency Transformer

返回目录/back

目标检测/Object Detection

Active Learning for Deep Object Detection via Probabilistic Modeling

Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters

Conditional Variational Capsule Network for Open Set Recognition

DetCo: Unsupervised Contrastive Learning for Object Detection

Detecting Invisible People

FMODetect: Robust Detection and Trajectory Estimation of Fast Moving Objects

GraphFPN: Graph Feature Pyramid Network for Object Detection

MDETR : Modulated Detection for End-to-End Multi-Modal Understanding

Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)

返回目录/back

3D目标检测 / 3D Object Detection

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

返回目录/back

目标跟踪 / Object Tracking

Learn to Match: Automatic Matching Network Design for Visual Tracking

返回目录/back

Image Semantic Segmentation

Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation (Oral)

Enhanced Boundary Learning for Glass-like Object Segmentation

Labels4Free: Unsupervised Segmentation using StyleGAN

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

Mining Latent Classes for Few-shot Segmentation(Oral)

Personalized Image Semantic Segmentation

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

返回目录/back

3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

返回目录/back

实例分割/Instance Segmentation

CDNet: Centripetal Direction Network for Nuclear Instance Segmentation

✔️Crossover Learning for Fast Online Video Instance Segmentation

✔️Instances as Queries

Rank & Sort Loss for Object Detection and Instance Segmentation (Oral)

返回目录/back

视频分割 / video semantic segmentation

返回目录/back

Medical Image Segmentation

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

返回目录/back

GAN

Manifold Matching via Deep Metric Learning for Generative Modeling

Toward Spatially Unbiased Generative Models

返回目录/back

细粒度分类/Fine-Grained Visual Categorization

Benchmark Platform for Ultra-Fine-Grained Visual Categorization BeyondHuman Performance

返回目录/back

Geometric deep learning

Manifold Matching via Deep Metric Learning for Generative Modeling

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation

返回目录/back

Zero/Few Shot

Domain Generalization via Gradient Surgery

Generalized Source-free Domain Adaptation

返回目录/back

Human Actions

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

✔️FineAction: A Fined Video Dataset for Temporal Action Localization

✔️MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

返回目录/back

手语识别/Sign Language Recognition

Visual Alignment Constraint for Continuous Sign Language Recognition

返回目录/back

Pose Estimation

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

返回目录/back

6D Object Pose Estimation

RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation

返回目录/back

Face Reconstruction

Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing

返回目录/back

行人重识别/Re-Identification

Learning Instance-level Spatial-Temporal Patterns for Person Re-identification

Learning Compatible Embeddings

TransReID: Transformer-based Object Re-Identification

返回目录/back

人群计数 /Crowd Counting

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework (Oral)

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting

返回目录/back

Motion Forecasting

RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

返回目录/back

Face-Anti-spoofing

CL-Face-Anti-spoofing

返回目录/back

deepfake

返回目录/back

对抗攻击/ Adversarial Attacks

TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning

跨模态检索/Cross-Modal Retrieval

Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

  • 论文/paper:None
  • 代码/code:None

返回目录/back

深度估计 / Depth Estimation

AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network

Motion Basis Learning for Unsupervised Deep Homography Estimationwith Subspace Projection

返回目录/back

视频插帧/Video Frame Interpolation

✔️XVFI: eXtreme Video Frame Interpolation(Oral)

返回目录/back

NeRF

GNeRF: GAN-based Neural Radiance Field without Posed Camera

In-Place Scene Labelling and Understanding with Implicit Scene Representation (Oral)

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (Oral)

返回目录/back

超分辨/Super-Resolution

Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

返回目录/back

Image Reconstruction

Equivariant Imaging: Learning Beyond the Range Space (Oral)

返回目录/back

Image Desnowing

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss

返回目录/back

Image Enhancement

Gap-closing Matters: Perceptual Quality Assessment and Optimization of Low-Light Image Enhancement

返回目录/back

Matching

Multi-scale Matching Networks for Semantic Correspondence

返回目录/back

人机交互/Hand-object Interaction

✔️CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

返回目录/back

视线估计/Gaze Estimation

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

返回目录/back

Contrastive-Learning

Social NCE: Contrastive Learning of Socially-aware Motion Representations

Parametric Contrastive Learning

返回目录/back

Graph Convolution Networks

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

返回目录/back

模型压缩/Compress

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

返回目录/back

点云/Point Cloud

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

MVP Benchmark: Multi-View Partial Point Clouds for Completion and Registration

Out-of-Core Surface Reconstruction via Global TGV Minimization

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility

Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis

返回目录/back

字体生成/Font Generation

✔️Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts

返回目录/back

Text Detection

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

返回目录/back

Scene Text Recognizer

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

返回目录/back

Autonomous-Driving

Road-Challenge-Event-Detection-for-Situation-Awareness-in-Autonomous-Driving

返回目录/back

Visdrone_detection

ICCV2021_Visdrone_detection

返回目录/back

其他/Others

Cross-Camera Convolutional Color Constancy

Learnable Boundary Guided Adversarial Training

Prior-Enhanced network with Meta-Prototypes (PEMP)

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Generalized-Shuffled-Linear-Regression (Oral)

VLGrammar: Grounded Grammar Induction of Vision and Language

A New Journey from SDRTV to HDRTV

IICNet: A Generic Framework for Reversible Image Conversion

Structure-Preserving Deraining with Residue Channel Prior Guidance

Learning with Noisy Labels via Sparse Regularization

Neural Strokes: Stylized Line Drawing of 3D Shapes

COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description

Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction

CanvasVAE: Learning to Generate Vector Graphic Documents

返回目录/back

About

ICCV 2021 paper with code