thanhmvu / CVPR2020-Code

CVPR 2020 Papers with open-source code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CVPR2020-Code

A list of CVPR 2020 Papers with open-source code, forked and translated from https://github.com/amusi/CVPR2020-Code

CNN

Exploring Self-attention for Image Recognition

Improving Convolutional Networks with Self-Calibrated Convolutions

Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets

Image classification

Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion

Spatially Attentive Output Layer for Image Classification

Target Detection

AugFPN: Improving Multi-scale Feature Learning for Object Detection

Noise-Aware Fully Webly Supervised Object Detection

Learning a Unified Sample Weighting Network for Object Detection

D2Det: Towards High Quality Object Detection and Instance Segmentation

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

Scale-Equalizing Pyramid Convolution for Object Detection

Revisiting the Sibling Head in Object Detector

Scale-equalizing Pyramid Convolution for Object Detection

Detection in Crowded Scenes: One Proposal, Multiple Predictions

Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

BiDet: An Efficient Binarized Object Detector

Harmonizing Transferability and Discriminability for Adapting Object Detectors

CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection

Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

EfficientDet: Scalable and Efficient Object Detection

3D target detection

Structure Aware Single-stage 3D Object Detection from Point Cloud

IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection

3DSSD: Point-based 3D Single Stage Object Detector

Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection

DSGN: Deep Stereo Geometry Network for 3D Object Detection

LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud

Video target detection

Memory Enhanced Global-Local Aggregation for Video Object Detection

Target Tracking

SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking

D3S -- A Discriminative Single Shot Segmentation Tracker

ROAM: Recurrently Optimizing Tracking Model

Siam R-CNN: Visual Tracking by Re-Detection

Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises

High-Performance Long-Term Tracking with Meta-Updater

AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization

Probabilistic Regression for Visual Tracking

MAST: A Memory-Augmented Self-supervised Tracker

Siamese Box Adaptive Network for Visual Tracking

Semantic segmentation

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

Single-Stage Semantic Segmentation from Image Labels

Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision

Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

Temporally Distributed Networks for Fast Video Segmentation

Context Prior for Scene Segmentation

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks

Learning Dynamic Routing for Semantic Segmentation

Instance segmentation

D2Det: Towards High Quality Object Detection and Instance Segmentation

PolarMask: Single Shot Instance Segmentation with Polar Representation

CenterMask: Real-Time Anchor-Free Instance Segmentation

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Deep Snake for Real-Time Instance Segmentation

Mask Encoding for Single Shot Instance Segmentation

Panorama segmentation

Pixel Consensus Voting for Panoptic Segmentation

BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation

Video target segmentation

A Transductive Approach for Video Object Segmentation

State-Aware Tracker for Real-Time Video Object Segmentation

Learning Fast and Robust Target Models for Video Object Segmentation

Learning Video Object Segmentation from Unlabeled Videos

Superpixel segmentation

Superpixel Segmentation with Fully Convolutional Networks

NAS

AOWS: Adaptive and optimal network width search with latency constraints

Densely Connected Search Space for More Flexible Neural Architecture Search

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

Neural Architecture Search for Lightweight Non-Local Networks

Rethinking Performance Estimation in Neural Architecture Search

CARS: Continuous Evolution for Efficient Neural Architecture Search

GAN

Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

Semantically Mutil-modal Image SynPaper

Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping

Learning to Cartoonize Using White-box Cartoon Representations

GAN Compression: Efficient Architectures for Interactive Conditional GANs

Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions

Re-ID

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking

Pose-guided Visible Part Matching for Occluded Person ReID

Weakly supervised discriminative feature learning with state information for person identification

3D point cloud (classification/segmentation/registration, etc.)

3D point cloud convolution

PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling

Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

Grid-GCN for Fast and Scalable Point Cloud Learning

FPConv: Learning Local Flattening for Point Convolution

3D point cloud classification

PointAugment: an Auto-Augmentation Framework for Point Cloud Classification

3D point cloud semantic segmentation

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Weakly Supervised Semantic Point Cloud Segmentation: Towards 10X Fewer Labels

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

Learning to Segment 3D Point Clouds in 2D Image Space

3D point cloud instance segmentation

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

3D point cloud registration

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

RPM-Net: Robust Point Matching using Learned Features

3D point cloud completion

Cascaded Refinement Network for Point Cloud Completion

3D point cloud target tracking

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

human face

Face recognition

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

Learning Meta Face Recognition in Unseen Domains

Face Detection

Human face detection

Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Facial expression recognition

Suppressing Uncertainties for Large-Scale Facial Expression Recognition

Face transformation

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

Face 3D reconstruction

AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"

FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

Human pose estimation (2D/3D)

2D human pose estimation

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

Distribution-Aware Coordinate Representation for Human Pose Estimation

3D human pose estimation

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach

Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data

Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image SynPaper

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

VIBE: Video Inference for Human Body Pose and Shape Estimation

Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

Human body analysis

Correlating Edge, Pose with Parsing

Scene text detection

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

Scene text recognition

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Feature (point) detection and description

SuperGlue: Learning Feature Matching with Graph Neural Networks

Super Resolution

Image Super Resolution

Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution

Learning Texture Transformer Network for Image Super-Resolution

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

Structure-Preserving Super Resolution with Gradient Guidance

Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy

Video Super Resolution

TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution

Space-Time-Aware Multi-Resolution Video Enhancement

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

Model compression/pruning

DMCP: Differentiable Markov Channel Pruning for Neural Networks

Forward and Backward Information Retention for Accurate Binary Neural Networks

Towards Efficient Model Compression via Learned Global Ranking

HRank: Filter Pruning using High-Rank Feature Map

GAN Compression: Efficient Architectures for Interactive Conditional GANs

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

Video understanding/behavior recognition

Oops! Predicting Unintentional Action in Video

PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition

Intra- and Inter-Action Understanding via Temporal Action Parsing

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

TEA: Temporal Excitation and Aggregation for Action Recognition

X3D: Expanding Architectures for Efficient Video Recognition

Temporal Pyramid Network for Action Recognition

Skeleton-based motion recognition

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Crowd Counting

Depth estimation

BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion

Focus on defocus: bridging the synthetic to real domain gap for depth estimation

Bi3D: Stereo Depth Estimation via Binary Classifications

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

Monocular depth estimation

On the uncertainty of self-supervised monocular depth estimation

3D Packing for Self-Supervised Monocular Depth Estimation

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

6D Pose Estimation

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

EPOS: Estimating 6D Pose of Objects with Symmetries

G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features

Gesture estimation

HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

Saliency Detection

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders

Denoising

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising

CycleISP: Real Image Restoration via Improved Data SynPaper

Deraining

Multi-Scale Progressive Fusion Network for Single Image Deraining

Deblurring

Video deblurring

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

Defogging

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion

Feature point detection and description

ASLFeat: Learning Local Features of Accurate Shape and Localization

Visual Question Answering VQA)

VC R-CNN:Visual Commonsense R-CNN

VideoQA

Hierarchical Conditional Relation Networks for Video Question Answering

Visual language navigation

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

Video compression

Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement

Video insertion

FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

Space-Time-Aware Multi-Resolution Video Enhancement

Scene-Adaptive Video Frame Interpolation via Meta-Learning

Softmax Splatting for Video Frame Interpolation

Style transfer

Diversified Arbitrary Style Transfer via Deep Feature Perturbation

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

Lane detection

Inter-Region Affinity Distillation for Road Marking Segmentation

Human-Object Interaction(HOT) Detection

PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection

Detailed 2D-3D Joint Representation for Human-Object Interaction

Cascaded Human-Object Interaction Recognition

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Trajectory prediction

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

Motion prediction

Collaborative Motion Prediction via Neural Motion Message Passing

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

Optical flow estimation

Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation

Image Search

Evade Deep Image Retrieval by Stashing Private Images in the Hash Space

Virtual Try-On

Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content

HDR

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

Adversarial sample

Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance

3D Reconstruction

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Depth completion

Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End

Semantic scene completion

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

Image/Video Captioning

Syntax-Aware Action Targeting for Video Captioning

Wireframe analysis

Holistically-Attracted Wireframe Parser

Dataset

Oops! Predicting Unintentional Action in Video

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

Open Compound Domain Adaptation

Intra- and Inter-Action Understanding via Temporal Action Parsing

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"

Learning to Autofocus

FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction

Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

Deep Homography Estimation for Dynamic Scenes

Assessing Image Quality Issues for Real-World Problems

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

PANDA: A Gigapixel-level Human-centric Video Dataset

IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

Others

CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Learning to Learn Single Domain Generalization

Open Compound Domain Adaptation

Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision

QEBA: Query-Efficient Boundary-Based Blackbox Attack

Equalization Loss for Long-Tailed Object Recognition

Instance-aware Image Colorization

Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

Epipolar Transformers

Bringing Old Photos Back to Life

MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask

Self-Supervised Viewpoint Learning from Image Collections

Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations

Towards Learning Structure via Consensus for Face Segmentation and Parsing

Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging

Lightweight Photometric Stereo for Facial Details Recovery

Footprints and Free Space from a Single Color Image

Self-Supervised Monocular Scene Flow Estimation

Quasi-Newton Solver for Robust Non-Rigid Registration

A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

DeepFLASH: An Efficient Network for Learning-based Medical Image Registration

Self-Supervised Scene De-occlusion

Polarized Reflection Removal with Perfect Alignment in the Wild

Background Matting: The World is Your Green Screen

What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Video Object Grounding using Semantic Roles in Language Description

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization

On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

GhostNet: More Features from Cheap Operations

AdderNet: Do We Really Need Multiplications in Deep Learning?

Deep Image Harmonization via Domain Verification

Blurry Video Frame Interpolation

Extremely Dense Point Correspondences using a Learned Feature Descriptor

Filter Grafting for Deep Neural Networks

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

Detecting Attended Visual Targets in Video

Deep Image Spatial Transformation for Person Image Generation

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

https://github.com/charlesCXK/3D-SketchAware-SSC

https://github.com/Anonymous20192020/Anonymous_CVPR5767

https://github.com/avirambh/ScopeFlow

https://github.com/csbhr/CDVD-TSP

https://github.com/ymcidence/TBH

https://github.com/yaoyao-liu/mnemonics

https://github.com/meder411/Tangent-Images

https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch

https://github.com/sjmoran/deep_local_parametric_filters

https://github.com/charlesCXK/3D-SketchAware-SSC

https://github.com/bermanmaxim/AOWS

https://github.com/dc3ea9f/look-into-object

Not-Sure

FADNet: A Fast and Accurate Network for Disparity Estimation

https://github.com/rFID-submit/RandomFID:Not sure

https://github.com/JackSyu/AE-MSR:Not sure

https://github.com/fastconvnets/cvpr2020:Not sure

https://github.com/aimagelab/meshed-memory-transformer:Not sure

https://github.com/TWSFar/CRGNet:Not sure

https://github.com/CVPR-2020/CDARTS:Not sure

https://github.com/anucvml/ddn-cvprw2020:Not sure

https://github.com/dl-model-recommend/model-trust:Not sure

https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior:Not sure

https://github.com/onetcvpr/O-Net:Not sure

https://github.com/502463708/Microcalcification_Detection:Not sure

https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine:Not sure

https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset:Not sure

https://github.com/cvpr-nonrigid/dataset:Not sure

https://github.com/theFool32/PPBA:Not sure

https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition

About

CVPR 2020 Papers with open-source code