This repo supplements our 3D Vision with Transformers Survey

Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

This repo includes all the 3D computer vision papers with Transformers which are presented in our paper, and we aim to frequently update the latest relevant papers.

Object Classification

Point Transformer [PDF][Code]

Point Transformer [PDF]

Attentional shapecontextnet for point cloud recognition [PDF]

Modeling point clouds with self-attention and gumbel subset sampling [PDF]

PCT: Point cloud transformer [PDF]

PVT: Point-Voxel Transformer for Point Cloud Learning [PDF]

Sewer defect detection from 3D point clouds using a transformer-based deep learning model [PDF]

Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [PDF]

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [PDF]

Dual Transformer for Point Cloud Analysis [PDF]

CpT: Convolutional Point Transformer for 3D Point Cloud Processing [PDF]

LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [Paper]

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [PDF]

Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [PDF]

Masked Autoencoders for Point Cloud Self-supervised Learning [PDF]

Patchformer: A versatile 3d transformer based on patch attention [PDF]

3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [PDF]

3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [PDF]

Centroid transformers: Learning to abstract with attention [PDF]

Point cloud learning with transformer [PDF]

3D Object Detection

3D object detection with pointformer [PDF]

Voxel Transformer for 3D Object Detection [PDF]

Improving 3D Object Detection with Channel-wise Transformer [PDF]

Group-Free 3D Object Detection via Transformers [PDF]

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [PDF]

An End-to-End Transformer Model for 3D Object Detection [PDF]

SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [PDF]

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [PDF]

Embracing Single Stride 3D Object Detector with Sparse Transformer [PDF]

Fast Point Transformer [PDF]

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [PDF]

ARM3D: Attention-based relation module for indoor 3D object detection [PDF]

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [PDF]

Attention-based Proposals Refinement for 3D Object Detection [PDF]

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [PDF]

LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [PDF]

SCANet: Spatial-channel attention network for 3d object detection [Paper]

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [PDF]

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [PDF]

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [PDF]

PETR: Position Embedding Transformation for Multi-View 3D Object Detection [PDF]

BoxeR: Box-Attention for 2D and 3D Transformers [PDF]

Bridged Transformer for Vision and Point Cloud 3D Object Detection [PDF]

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [PDF]

Point Density-Aware Voxels for LiDAR 3D Object Detection [PDF]

3D Segmentation

For part segmentation, check Object Classification

Complete Scenes Segmentation

Fast Point Transformer [PDF]

Stratified Transformer for 3D Point Cloud Segmentation [PDF]

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [PDF]

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [PDF]

Point Cloud Video Segmentation

Point 4D transformer networks for spatio-temporal modeling in point cloud videos [PDF]

Spatial-Temporal Transformer for 3D Point Cloud Sequences [PDF]

Medical Imaging Segmentation

Unetr: Transformers for 3d medical image segmentation [PDF]

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [PDF]

Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [PDF]

T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [PDF]

Transfuse: Fusing transformers and cnns for medical image segmentation [PDF]

Convolution-free medical image segmentation using transformers [PDF]

Spectr: Spectral transformer for hyperspectral pathology image segmentation [PDF]

Transbts: Multimodal brain tumor segmentation using transformer [PDF]

Medical image segmentation using squeezeand-expansion transformers [PDF]

nnformer: Interleaved transformer for volumetric segmentation [PDF]

Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [PDF]

After-unet: Axial fusion transformer unet for medical image segmentation [PDF]

A volumetric transformer for accurate 3d tumor segmentation [PDF]

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [PDF]

3D Point Cloud Completion

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [PDF]

Learning Local Displacements for Point Cloud Completion [PDF]

PointAttN: You Only Need Attention for Point Cloud Completion [PDF]

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [PDF]

Point cloud completion on structured feature map with feedback network [PDF]

PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [PDF]

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [PDF]

ShapeFormer: Transformer-based Shape Completion via Sparse Representation [PDF]

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [PDF]

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [PDF]

3D Pose Estimation

3D Human Pose Estimation with Spatial and Temporal Transformers [PDF]

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [PDF]

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [PDF]

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [PDF]

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [PDF]

Epipolar Transformer for Multi-view Human Pose Estimation [PDF]

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [PDF]

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [PDF]

Efficient Virtual View Selection for 3D Hand Pose Estimation [PDF]

End-to-End Human Pose and Mesh Reconstruction with Transformers [PDF]

HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [PDF]

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [PDF]

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [PDF]

Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [PDF]

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [PDF]

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [PDF]

Zero-Shot Category-Level Object Pose Estimation [PDF]

Other Tasks

3D Tracking

3d object tracking with transformer [PDF]

Pttr: Relational 3d point cloud object tracking with transformer [PDF]

3D Motion Prediction

History repeats itself: Human motion prediction via motion attention [PDF]

Learning progressive joint propagation for human motion prediction [PDF]

A spatio-temporal transformer for 3d human motion prediction [PDF]

Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [PDF]

Pose transformers (potr): Human motion prediction with non-autoregressive transformer [PDF]

Gimo: Gaze-informed human motion prediction in context [PDF]

3D Reconstruction

Multi-view 3d reconstruction with transformer [PDF]

Thundr: Transformer-based 3d human reconstruction with marker [PDF]

Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [PDF]

Point Cloud Registration

Deep closest point: Learning representations for point cloud registration [PDF]

Robust point cloud registra tion framework based on deep graph matching [PDF]

Regtr: End-to-end point cloud correspondences with transformer [PDF]

Citation

If you find the listing or the survey useful for your work, please cite our paper:

@misc{lahoud20223d,
      title={3D Vision with Transformers: A Survey}, 
      author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
      year={2022},
      eprint={2208.04309},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

VIROBO-15 / 3d-vision-transformers