open-papernotes

This idea of paper notes was inspired by Hongjie Peng, and the methodology of taking notes was inspired by Qi Zeng, Daniel Seita, Adrian Colyer and Denny Britz.

This repo contains my notes for research papers that I've read.

Rubrics from `Y2021`:

Papers are numbered on a 1 to 5 scale in the following aspects:

C: I understand the research problem and Challenges.
M: I understand the Main idea and the main contributions to the literature.
E: I am familiar with the details of Experiments.
L: I am able to find out the Limitations of the proposed method.

Rubrics before `Y2021`:

Papers are numbered on a (1) to (5) scale where

(1) means I have only barely skimmed it or listened to the presentation.
(2) means (1) + I understand the main idea and the main contributions to the literature.
(3) means (2) + related works.
(4) means (3) + details of the experiments.
(5) means (4) + I feel confident that I understand almost everything about the paper.

In addition, (0) is used simply to indicate papers that are in the "toread" list.

Each note contains C, M, L in Rubrics from Y2021 and takeaways.

In terms of articles or research posts that are not published in peer review conference or journel, they are numbered on a (1) to (5) scale as well to indicate the extent to which I understand the content.

All papers are included in my previous repo papers, which is set to private since it contains papers that need purchased. However, the original papers that are public available without purchasing can be found in the link along with the title of the paper.

Y2021 Y2020 Y2019

cedrickchee's awesome-ml-model-compression

xiaobai1217's Awesome-Video-Datasets

pliang279's awesome-multimodal-ml

jinwchoi's awesome-action-recognition

abhineet123's Deep-Learning-for-Tracking-and-Detection

Recent Notes

Y2023 Nov

Match Cutting: Finding Cuts with Smooth Visual Transitions, WACV 2023 C1M1E1L1 (Multimodal, Contrastive Learning) [paper arXiv] [code] [my repo]

Y2023 Oct

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, 2017 C3M2E1L1 (Efficient NN) [paper arXiv] [code tf] [code] [my repo]
Experience: Adopting Indoor Outdoor Detection in On-demand Food Delivery Business, MobiCom 2022 C1M1E1L1 [paper] [my repo]
Can IoT Wearable Devices Feed Frugal Innovation?, FRUGALTHINGS 2022 C1M1E1L1 (IoT) [paper] [my repo]
Experience: Pushing Indoor Localization from Laboratory to the Wild, MobiCom 2022 C1M1E1L1 (Localization) [paper] [my repo]
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation, ICCV 2023 C2M1E1L1 (BEV, Knowledge Distillation) [paper] [code] [my repo]
Robot Learning with Sensorimotor Pre-training, 2023 C3M2E1L1 [paper arXiv] [website] [my repo]

Y2023 Sep

Towards Memory-Efficient Inference in Edge Video Analytics, HotEdgeVideo 2021 C3M3E1L2 [paper] [my repo]
HG-DAgger: Interactive Imitation Learning with Human Experts, ICRA 2019 C3M2E1L1 [paper arXiv] [paper] [my repo]
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism, NeurIPS 2023 C1M1E1L1 [paper arXiv] [code]
INFaaS: Automated Model-less Inference Serving, USENIX 2021 C3M2E1L1 [paper] [video] [code] [my repo]
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text, 2023 C1M1E1L1 [paper arXiv] [dataset] [my repo]
DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback, 2018 C3M3E3L1 [paper arXiv] [video] [my repo]
Few-Shot Preference Learning for Human-in-the-Loop RL, CoRL 2022 C1M1E1L1 [paper] [paper arXiv] [website] [forum] [code] [my repo]
Boosting DNN Cold Inference on Devices, MobiSys 2023 C2M1E1L1 [paper] [paper arXiv] [my repo]
A Workload-Aware DVFS Robust to Concurrent Tasks for Mobile Devices, MobiCom 2023 C2M2E1L1 [paper] [my repo]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, 2022 C2M1E1L1 (Language, RL) [paper arXiv] [website] [video] [code] [my repo]
Correcting Robot Plans with Natural Language Feedback, RSS 2022 C2M1E1L1 (Language, Robotics) [paper] [paper arXiv]
Topological Semantic Graph Memory for Image-Goal Navigation, CoRL 2022 C2M1E1L1 (Visual Navigation, Topological Graph) [paper] [paper arXiv] [forum] [code] [my repo]
Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space, RSS 2023 C1M1E1L1 (Visual Navigation) [paper] [paper arXiv] [website] [code] [my repo]
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, NeurIPS 2020 C1M1E1L1 (Contrastive Learning, SSL) [paper] [paper arXiv] [code] [my repo]

Y2023 Aug

Deep Reinforcement Learning from Human Preferences, NIPS 2017 C2M1E1L1 (RL) [paper] [paper arXiv] [my repo]
From Cognitive Maps to Cognitive Graphs, PLOS 2014 C1M1E1L1 (Cognitive Graphs) [paper] [my repo]
No RL, No Simulation: Learning to Navigate without Navigating, NeurIPS 2021 C2M1E1L1 (Visual Navigation, RL) [paper] [paper arXiv] [website] [video] [code] [my repo]
One-4-All: Neural Potential Fields for Embodied Navigation, arXiv 2023 C1M1E1L1 (Visual Navigation, RL) [paper arXiv] [my repo]
FeUdal Networks for Hierarchical Reinforcement Learning, ICML 2017 C1M1E1L1 (RL) [paper arXiv] [paper] [my repo]
ViNG: Learning Open-World Navigation with Visual Goals, ICRA 2021 C2M1E1L1 (Visual Navigation) [paper arXiv] [website] [my repo]

Y2023 Jul

Learning both Weights and Connections for Efficient, NIPS 2015 C2M2E1L1 (Efficient NN, Pruning) [paper] [paper arXiv] [my repo]
Network In Network, 2014 C2M1E1L1 (Efficient NN) [paper arXiv] [my repo]
Pruning Filters for Efficient ConvNets, ICLR 2017 C3M3E2L1 (Efficient NN, Pruning) [paper] [paper arXiv] [forum] [code] [my repo]

Y2023 Jun

Understanding a Deep Neural Network Based on Neural-Path Coding, Access 2020 C3M3E1L1 [paper] [my repo]
Learning Efficient Convolutional Networks through Network Slimming, ICCV 2017 C2M2E1L1 [paper] [code] [my repo]
Avatar Poser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing, ECCV 2022 C3M2E2L1 [note] [paper arXiv] [website] [code] [my repo]

Y2023 May

Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation, CVPR 2022 C1M1E1L1 [paper] [paper arXiv] [website] [my repo]
SMPL: A Skinned Multi-Person Linear Model, TOG 2015 C1M1E1L1 [paper] [paper pdf]
Physical Inertial Poser(PIP)- Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors, CVPR 2022 C1M1E1L1 [paper] [paper arXiv] [website] [my repo]
MoSh: Motion and Shape Capture from Sparse Markers, TOG 2014 C1M1E1L1 [paper] [paper pdf]
Learning a Pedestrian Social Behavior Dictionary, CHI 2023 C2M1E1L1 [paper arXiv] [my repo]
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches,and Earbuds, CHI 2023 C2M2E1L1 [paper arXiv] [paper] [my repo]
End-to-End Human Pose and Mesh Reconstruction with Transformers, CVPR 2021 C1M1E1L1 [paper arXiv] [paper] [my repo]
Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time, TOG 2018 C3M3E1L1 [paper] [code] [my repo]
Augur: Modeling the Resource Requirements of ConvNets on Mobile Devices, TOMC 2021 C1M1E1L1 [paper] [my repo]

Y2023 Apr

Hardware-friendly Deep Learning by Network Quantization and Binarization, IJCAI 2021 C1M1E1L1 (Binary NN, Quantization) [paper] [my repo]
SiFall: Practical Online Fall Detection with RF Sensing, SenSys 2022 C1M1E1L1 (RF, Fall Detection) [paper] [my repo]
Person Re-Identification Using WiFi Signals, MobiCom 2022 C1M1E1L1 (Wi-Fi, Re-ID) [paper] [my repo]
Indoor Smartphone SLAM with Learned Echoic Location Features, SenSys 2022 C1M1E1L1 (SLAM) [paper] [my repo]
M4esh: mmWave-based 3D HumanMeshConstruction for Multiple Subjects, SenSys 2022 C1M1E1L1 (mmWave, Human Mesh Construction) [paper] [website] [my repo]
Wi-Mesh: A WiFi Vision-based Approach for 3D Human Mesh Construction, SenSys 2022 C1M1E1L1 (Wi-Fi, Human Mesh Construction) [paper] [my repo]
Wi-Drone: Wi-Fi-based 6-DoF Tracking for Indoor Drone Flight Control, SenSys 2022 C3M3E1L1 (Wi-Fi) [paper] [my repo] [my YouTube] [my Bilibili]
GhostNet: More Features from Cheap Operations, CVPR 2020 C2M2E1L1 (Efficient NN) [paper arXiv] [my repo]
AutoMatch: Leveraging Traffic Camera to Improve Perception and Localization of Autonomous Vehicles, SenSys 2022 C2M2E1L1 (Autonomous Driving) [paper] [my repo]
The Implicit Bias of Gradient Descent on Separable Data, JMLR 2018 C3M3E1L3 (DL Theory) [paper arXiv] [[my repo]
VIPS: Real-Time Perception Fusion for Infrastructure-Assisted Autonomous Driving, MobiCom 2022 C2M2E1L1 (Autonomous Driving) [paper] [my repo]
The Implicit Bias of Gradient Descent on Separable Data, JMLR 2018 C3M3E1L3 (DL Theory) [paper arXiv] [my repo] [my YouTube] [my Bilibili]

Y2023 Mar

A ConvNet for the 2020s, CVPR 2022 C2M2E1L1 [paper] [my repo]
Efficient Training of Visual Transformers with Small Datasets, NeurIPS 2021 C1M1E1L1 [paper arXiv] [code] [my repo]
Intriguing Properties of Neural Networks, ICLR 2014 C1M1E1L1 [paper arXiv] [my repo]
Intriguing Properties of Vision Transformers, NeurIPS 2021 C3M2E1L1 (Transformer) [paper arXiv] [code] [my repo]
YOLOv7: Trainable Bag-of-freebies Sets New state-of-the-art for Real-time Object Detectors, CVPR 2023 C1M1E1L1 (Efficient NN, Object Detection) [paper arXiv] [code] [my repo]
Layerwise Class-Aware Convolutional Neural Network, TOCIASFVT 2017 C2M2E1L1 (Class-Aware) [paper] [my repo]
Feature Statistics Guided Efficient Filter Pruning, IJCAI 2020 C3M2E2L1 (Pruning) [paper arXiv] [my repo]
Class-Discriminative CNN Compression, ICPR 2022 C2M2E1L1 (Class-Aware Pruning) [paper arXiv] [my repo]
Network Dissection: Quantifying Interpretability of Deep Visual Representations, CVPR 2017 C2M2E1L1 (Class Activation Map) [paper] [my repo]
Learning Deep Features for Discriminative Localization, CVPR 2016 C2M2E1L1 (Class Activation Map) [paper arXiv] [my repo]
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, ICCV 2017 C3M2E1L1 (Class Activation Map) [paper ICCV] [paper arXiv] [my repo]
CAP’NN: Class-Aware Personalized Neural Network Inference, DAC 2020 C3M2E1L1 (Class-Aware Pruning) [paper] [my repo]
Layer-adaptive Sparsity for the Magnitude-based Pruning, ICLR 2021 C2M1E1L1 (Pruning) [paper] [conf] [forum] [my repo]
A Fast Post-Training Pruning Framework for Transformers, NeurIPS 2022 C1M1E1L1 (Pruning, Transformer) [paper] [forum] [my repo]
NoScope: Optimizing Neural Network Queries over Video at Scale, VLDB 2017 C1M1E1L1 [paper] [code] [my repo]
Effectively Leveraging Attributes for Visual Similarity, ICCV 2021 C1M1E1L1 (Visual Similarity) [paper arXiv] [code] [my repo]
Dynamic Network Quantization for Efficient Video Inference, ICCV 2021 C1M1E1L1 (Efficient NN, Quantization) [paper arXiv] [code] [my repo]
Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring, DSAA 2017 C1M1E1L1 [paper] [my repo]

Y2023 Feb

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth, ICLR 2021 C3M3E2L1 (Representation Similarity) [paper arXiv] [forum] [code] [my repo]
Siamese Neural Networks for One-shot Image Recognition, ICML 2015 C3M3E2L1 (One-shot) [paper] [code] [my repo]
Memory-Efficient Learned Image Compression with Pruned Hyperprior Module, ICIP 2022 C3M3E2L1 (Efficient NN) [paper] [my repo]
Efficient Inference of Image-based Neural Network Models in Reconfigurable Systems with Pruning and Quantization, ICIP 2022 C1M1E1L1 (Efficient NN) [paper] [my repo]
Loss Landscapes and Optimization in Over-parameterized Non-linear Systems and Neural Networks, ACHA 2021 C2M1E1L1 (DL Theory) [paper arXiv] [code] [YouTube] [my repo]
Domain-invariant Feature Exploration for Domain Generalization, TMLR 2022 C2M1E1L1 (Domain Adaptation) [paper arXiv] [code] [forum] [my repo]
Digging Into Self-Supervised Monocular Depth Estimation, ICCV 2019 C2M2E1L1 (SSL, Depth Estimation) [paper arXiv] [code] [YouTube] [my repo]

Y2023 Jan

Cross Aggregation Transformer for Image Restoration, NeurIPS 2022 C3M2E1L1 (Image Restoration) [paper arXiv] [code] [my repo]
Pedestrian Trajectory Prediction in Heterogeneous Traffic using Facial Keypoints-based Convolutional Encoder-decoder Network, TOIT 2022 C3M3E1L1 (Trajectory Pretiction) [paper] [my repo]
Smartphone-Based Indoor Visual Navigation with Leader-Follower Mode, TOSN 2021 C2M2E1L1 (SLAM) [paper] [my repo]
A Reference-free Evaluation Metric for Image Captioning, EMNLP 2021 C1M1E1L1 [paper arXiv] [code] [my repo]
Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness, ICML 2022 C2M2E1L2 [paper] [code] [my repo]
S2-Transformer for Mask-Aware Hyperspectral Image Reconstruction, arXiv 2022 C2M2E2L1 (CASSI, SCI, HSI) [paper] [code] [my repo]

Y2022 Dec

Distream: Scaling Live Video Analytics with Workload-Adaptive Distributed Edge Intelligence, SenSys 2020 C1M1E1L1 (Video Analytics) [paper] [my repo]
AWStream: Adaptive Wide-Area Streaming Analytics, SIGCOMM 2018 C1M1E1L1 (Video Analytics) [paper] [my repo]
GaitSense: Towards Ubiquitous Gait-Based Human Identification with Wi-Fi, TOSN 2021 C2M1E1L1 (Wi-Fi, Identification, Gait) [paper] [my repo]
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training, NeurIPS 2022 C1M1E1L1 [paper] [paper arXiv] [forum] [code] [my repo]
Multiple People Identification Through Walls Using Off-the-Shelf WiFi, IoT 2020 C1M1E1L1 [paper] [my repo]
Learning Implicit Feature Alignment Function for Semantic Segmentation, ECCV 2022 C1M1E1L1 [paper arXiv] [my repo]
Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device, CVPR 2022 C1M1E1L1 [paper] [my repo]
Panoptic, Instance and Semantic Relations- A Relational Context Encoder to Enhance Panoptic Segmentation, CVPR 2022 C2M1E1L1 [paper] [my repo]
Self-supervised Geometric Correspondence for Category-level 6D Object Pose Estimation in the Wild, ICLR 2023 C2M1E1L1 [paper] [forum] [my repo]
PieAPP: Perceptual Image-Error Assessment through Pairwise Preference, CVPR 2018 C1M1E1L1 [paper] [my repo]
Angle-Regulated Transformer Network for Pedestrian Trajectory Prediction, IJCAIW 2022 C1M1E1L1 [paper] [YouTube] [my repo]
SparseTT: Visual Tracking with Sparse Transformers, IJCAI 2022 C1M1E1L1 [paper] [code] [my repo]
MMT: Multi-Way Multi-Modal Transformer for Multimodal Learning, IJCAI 2022 C1M1E1L1 [paper] [my repo]

Y2022 Nov

How Do Vision Transformers Work?, ICLR 2022 C3M2E1L1 [paper] [paper arXiv] [forum] [code] [my repo] [my YouTube] [my Bilibili]
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks, TransCybern 2022 C1M1E1L1 [paper] [paper arXiv] [my repo]
Structural Pruning via Latency-Saliency Knapsack, NeurIPS 2022 C2M1E1L1 [paper] [paper arXiv] [website] [my repo]

Y2022 Oct

Towards Streaming Perception, ECCV 2020 C3M1E1L1 [paper] [website] [YouTube] [Bilibili] [my repo]

Y2022 Sep

YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors, 2022 C1M1E1L1 [paper] [code] [my repo]
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR 2018 C2M1E1L1 [paper] [code] [my repo]
RoNIN: Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations, and New Methods, ICRA 2020 C3M2E1L1 [paper] [paper arXiv] [pwc] [website] [video] [code] [my repo]
Mobile-Former: Bridging MobileNet and Transformer, CVPR 2022 C1M1E1L1 (EfficientML, Transformer) [code] [paper] [my repo]
MiniViT: Compressing Vision Transformers with Weight Multiplexing, CVPR 2022 C1M1E1L1 (EfficientML, Transformer) [code] [paper] [my repo]
TinyViT: Fast Pretraining Distillation for Small Vision Transformers, ECCV 2022 C1M1E1L1 (EfficientML, Transformer) [code] [paper] [my repo]
Classifier Recommendation Using Data Complexity Measures, ICPR 2018 C1M1E1L1 (EfficientML) [paper] [my repo]

Y2022 Aug

Complexity of Representations in Deep Learning, ICPR 2022 C3M3E3L2 (EfficientML) [paper] [my repo]
MissFormer: (In-)attention-based Handling of Missing Observations for Trajectory Filtering and Prediction, ISVC 2021 C3M2E2L2 (EfficientML) [paper] [my repo]
Entropy-Constrained Training of Deep Neural Networks, IJCNN 2019 C1M1E1L1 (EfficientML) [paper] [my repo]
Entropy and Mutual Information in Models of Deep Neural Networks, NeurIPS 2018 C1M1E1L1 (EfficientML) [paper] [my repo]
Deep Learning and the Information Bottleneck Principle, ITW 2015 C2M2E1L1 (EfficientML) [paper] [my repo]
Combining Passive Visual Cameras and Active IMU Sensors for Persistent Pedestrian Tracking, JVisCommunImageRepresent 2017 C2M2E1L1 (IMU, Tracking) [paper] [my repo]

Y2022 July

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs, WACV 2022 C2M2E2L1 (EfficientML) [code] [paper] [my repo]
Exploiting the Redundancy in Convolutional Filters for Parameter Reduction, WACV 2021 C2M2E1L1 (EfficientML) [code] [paper] [my repo]
Visual and Semantic Similarity in ImageNet, CVPR 2011 C2M2E1L1 (Visual Similarity) [paper] [my repo]
Learning Visual Similarity for Product Design with Convolutional Neural Networks, TOG 2015 C3M3E1L1 (Visual Similarity) [paper] [my repo]

Y2022 June

YOLOv4: Optimal Speed and Accuracy of Object Detection, arXiv 2020 C2M2E1L1 (EfficientML) [paper arXiv] [code] [my repo]
Focal Loss for Dense Object Detection, ICCV 2017 C1M1E1L1 (EfficientML) [paper] [paper arXiv] [code] [my repo]
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition, NeurIPS 2019 C3M3E1L1 (EfficientML) [note] [paper] [my repo]
Tinier-YOLO: A Real-Time Object Detection Method for Constrained Environments, Access 2019 C4M4E2L1 (Object Detection, EfficientML) [note] [paper] [code] [my repo]

Y2022 May

PANDA: A Gigapixel-level Human-centric Video Dataset, CVPR 2020 C2M2E1L1 (Dataset, Object Detection, Tracking) [paper] [paper arXiv] [dataset] [my repo]
Flexible High-resolution Object Detection on Edge Devices with Tunable Latency, MobiCom 2021 C2M3E2L2 (EfficientML, Object Detection) [note] [paper] [my repo] [my YouTube] [my Bilibili]

Y2022 Apr

Vi-Fi: Associating Moving Subjects across Vision and Wireless Sensors, IPSN 2022 C4M4E3L3 (Multimodal, Association, IMU, WiFi, FTM) [paper] [dataset] [code] [my repo]

Y2022 Mar

Distilling Object Detectors with Fine-grained Feature Imitation, CVPR 2019 C3M3E1L1 (Efficient ML, Knowledge Distillation, Object Detection) [note] [paper] [code0] [code1] [my repo]
Visage: Enabling Timely Analytics for Drone Imagery, MobiCom 2021 C1M1E1L1 (Efficient ML, Edge Computing) [paper] [video] [my repo]
Elf: Accelerate High-resolution Mobile Deep Vision with Content-aware Parallel Offloading, MobiCom 2021 C2M1E1L1 (Efficient ML, Edge Computing) [paper] [video] [code] [my repo]
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision, MobiCom 2021 C2M2E1L1 (Efficient ML, Filter, Pruning, Low Rank Decomposition, Knowledge Distillation) [note] [paper] [paper arXiv] [video] [my repo]
Learning Efficient Object Detection Models with Knowledge Distillation, NIPS 2017 C3M3E2L2 (Efficient ML, Knowledge Distillation) [note] [paper] [code0] [code1] [my repo]
Auto-scaling Vision Transformers Without Training, ICLR 2022 C2M1E1L1 (Efficient ML, NAS) [note] [paper] [forum] [code] [my repo]
CDNet: A Real-time and Robust Crosswalk Detection Network on Jetson Nano Based on YOLOv5, NeuralComputAppl 2022 C3M2E2L1 (Efficient ML, Object Detection) [paper] [code] [my repo]
Scaling Vision Transformers, arXiv 2021 C2M1E2L2 [note] [paper arXiv] [my repo]
Spectral Compressive Imaging Reconstruction Using Convolution and Spectral Contextual Transformer, arXiv 2022 C1M1E1L1 [paper arXiv] [my repo]
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction, CVPR 2022 C3M2E1L2 [paper arXiv] [my repo]
HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging, arXiv 2021 C1M1E1L1 [paper arXiv] [my repo]
Localization Distillation for Dense Object Detection, CVPR 2022 C2M2E1L1 [note] [paper arXiv] [code] [my repo]
This note is all you need (for Transformer video tutorials) C1M1E1L1 [Yannic Kilcher: Transformer, BERT] [codebasics: Word2Vec, BERT] [AI Coffee Break with Letitia: Transformer] [CodeEmporium: Transformer, BERT] [Leo Dirac: Transformer] [Henry AI Labs: BERT] [Jay Alammar: Transformer] [The AI Epiphany: Transformer0 Transformer1] [Hung-yi Lee: Transformer] [ChrisMcCormickAI: Bert[ep1][ep4][ep5][ep6][ep7][ep8]]

Y2022 Feb

Deep Fusion of Appearance and Frame Differencing for Motion Segmentation, CVPRW 2021 C2M2E2L1 (Frame-differencing) [note] [paper] [dataset] [dataset GT] [my repo]
CDnet 2014: An Expanded Change Detection Benchmark Dataset, CVPRW 2014 C1M1E1L1 (Frame-differencing) [paper] [dataset] [my repo]
Decentralized Modular Architecture for Live Video Analytics at the Edge, MobiComW 2021 C1M1E1L1 [paper] [my repo]
Knowledge Distillation: A Survey, IJCV 2021 C1M1E1L1 [paper] [paper arXiv] [my repo]
Distilling the Knowledge in a Neural Network, NIPSW 2015 C2M2E2L2 [note] [paper] [paper arXiv] [code] [my repo]
Multi-Stage Progressive Image Restoration, CVPR 2021 C1M1E1L1 [paper] [paper arXiv] [code] [my repo]
A ConvNet for the 2020s, arXiv 2022 C1M1E1L1 [paper arXiv] [code] [my repo]
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning, CVPR 2017 C1M1E1L1 [paper] [my repo]
Going Deeper with Convolutions, CVPR 2015 C2M1E1L1 [paper] [code] [my repo]
EfficientDet: Scalable and Efficient Object Detection, CVPR 2020 C3M3E3L1 (Efficient ML) [paper] [code] [my repo] [my YouTube] [my Bilibili]
A Hierarchical Approach for Associating Body-Worn Sensors to Video Regions in Crowded Mingling Scenarios, MM 2019 C1M1E1L1 (Multi-modal) [paper] [my repo]
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, ECCV 2016 C1M1E1L1 (Binary NN) [paper] [code Lua] [code PyTorch] [my repo]
Binary Neural Networks: A Survey, PR 2020 C1M1E1L1 (Binary NN) [paper] [paper arXiv] [my repo]
FedDL: Federated Learning via Dynamic Layer Sharing for Human Activity Recognition, SenSys 2021 C3M2E1L1 (Federated Learning, IMU, HAR) [paper] [video] [my repo]
ClusterFL: A Similarity-Aware Federated Learning System for Human Activity Recognition, MobiSys 2021 C4M4E2L2 (Federated Learning, IMU, HAR) [note] [slide] [paper] [video] [my repo]

Y2022 Jan

Seeing Voices and Hearing Faces: Cross-modal biometric matching, CVPR 2018 C2M3E2L2 (Multi-modal) [paper] [website] [video] [dataset] [code] [my repo]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, ICCV 2021 C1M1E1L1 [paper] [paper arXiv] [code] [my repo]
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence, arXiv 2020 C1M1E1L1 [paper] [website] [my repo]
Video Analytics Gait Trend Measurement for Fall Prevention and Health Monitoring, ICPR 2020 C1M1E1L1 [paper] [my repo]
Verification: Accuracy Evaluation of WiFi Fine Time Measurements on an Open Platform, MobiCom 2018 C1M1E1L1 (FTM) [paper] [video] [my repo]
Zero-Shot Learning for IMU-Based Activity Recognition Using Video Embeddings, IMWUT 2021 C1M1E1L1 (IMU, Zero-Shot Learning, HAR) [paper] [my repo]

Y2021 Dec

Self-supervised Learning for Reading Activity Classification, IMWUT 2021 C1M1E1L1 (IMU, Self-Supervised Learning, HAR) [paper] [video] [my repo]

Y2021 Nov

Gaussian Error Linear Units, arXiv 2016 C2M2E1L1 [paper] [code] [my repo]
LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMUSensing Applications, Best paper runner-up SenSys 2021 C3M2E2L1 (IMU, Self-Supervised Learning, HAR, DPC) [paper] [code] [my repo] [my YouTube] [my Bilibili]
Deep Reinforcement Learning for Visual Object Tracking in Videos, ax 2017 C1M1E1L1 (Reinforcement Learning, Tracking) [paper] [code] [my repo]
Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning, CVPR 2017 C2M2E2L1 (Reinforcement Learning, Tracking) [paper] [my repo]
Masked Autoencoders Are Scalable Vision Learners, 2021 C2M2E2L1 [paper] [my repo]
DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints, IROS 2019 C1M1E1L1 (VIO) [paper] [my repo]

Y2021 Oct

Lossless Image Compression through Super-Resolution, 2020 C1M1E1L1 [paper] [code] [my repo]
When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous, IROS 2020 C1M1E1L1 [paper] [my repo]
DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image, WACV 2020 C1M1E1L1 [paper] [my repo]
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021 C1M1E1L1 [paper] [video] [code] [my repo]
Learning Sensor Interdependencies for IMU-to-Segment Assignment, Access 2021 C2M2E2L2 (Vis, IMU, Multi-modal) [paper] [my repo]
mID: Tracking and Identifying People with Millimeter Wave Radar, DCOSS 2019 C2M2E1L1 [paper] [my repo]

Y2021 Sep

OAS-Net: Occlusion Aware Sampling Network for Accurate Optical Flow, ICASSP 2021 C1M1E1L1 [paper] [my repo]
GROOT: A Real-time Streaming System of High-Fidelity Volumetric Videos, MobiCom 2020 C2M2E1L1 [paper] [my repo]
Image Super-Resolution Using Very Deep Residual Channel Attention Networks, ECCV 2018 C1M1E1L1 (Super-Resolution) [paper] [code] [my repo]
Single Image Super-Resolution via a Holistic Attention Network, ECCV 2020 C3M2E1L1 (Super-Resolution) [paper] [code] [my repo]
When Video meets Inertial Sensors- Zero-shot Domain Adaptation for Finger Motion Analytics with Inertial Sensors, IoTDI 2021 C1M1E1L1 (IMU, HAR) [paper] [my repo]
Toward Cooperative Localization of Wearable Sensors using Accelerometer and Camera, INFOCOM 2010 C1M1E1L1 (Vis, IMU, Multi-modal) [paper] [my repo]
EV-Loc: Integrating Electronic and Visual Signals for Accurate Localization, MobiHoc 2012, Trans Netw 2014 C3M2E3L3 (Vis, WiFi, Multi-modal) [paper0] [paper1] [my repo0] [my repo1]
Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors, BMVC 2017 C3M3E3L3 [paper] [website] [my repo]
IDIoT: Towards Ubiquitous Identification of IoT Devices through Visual and Inertial Orientation Matching During Human Activity, IoTDI 2020 C2M1E2L2 (IMU, HAR) [paper] [code] [my repo]

Y2021 Aug

Deep Tensor ADMM-Net for Snapshot Compressive Imaging, ICCV 2019 C1M1E1L1 (SCI) [paper] [code] [my repo]
GAP-net for Snapshot Compressive Imaging C2M2E2L1 (SCI) [paper] [code] [my repo]
Deep Plug-and-play Priors for Spectral Snapshot, OSA 2021 C1M1E1L1 (SCI) [paper] [code] [my repo]
Class-Aware Domain Adaptation for Semantic Segmentation of Remote Sensing Images, TGRS 2020 C1M1E1L1 (UDA) [paper] [my repo]
EdgeCompression: An Integrated Framework for Compressive Imaging Processing on CAVs, SEC 2020 C3M3E1L2 (SCI, Object Detection) [paper] [code] [my repo]
Plug-and-Play Algorithms for Video Snapshot Compressive Imaging C2M2E2L2 (SCI) [paper] [code] [my repo]
Gate-ID: WiFi-Based Human Identification Irrespective of Walking Directions in Smart Home, IoT 2021 C2M2E2L2 (WiFi) [paper] [my repo]
Teaching RF to Sense without RF Training Measurements, IMWUT 2020 C4M4E4L3 (RF, WiFi, HAR) [slide] [paper] [my repo] [my YouTube] [my Bilibili]
FiDo: Ubiquitous Fine-Grained WiFi-based Localization for Unlabelled Users via Domain Adaptation, WWW 2020 C1M1E1L1 (WiFi) [paper] [my repo]
Memory-Efficient Network for Large-scale Video Compressive Sensing, CVPR 2021 C1M1E1L1 (SCI) [paper] [code] [my repo]
The Sound of Motions, ICCV 2019 C1M1E1L1 (Multi-modal) [paper] [my repo]
Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, CVPR 2019 C1M1E1L1 (3D_Detection) [paper] [code] [my repo]
Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving 1 [blog]
XModal-ID: Using WiFi for Through-Wall Person Identification from Candidate Video Footage, MobiCom 2019 C1M1E1L1 (WiFi) [paper] [my repo]
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening, CVPR 2021 C1M1E1L1 (UDA) [paper] [code] [my repo]
Let There Be IMU Data: Generating Training Data for Wearable, Motion Sensor Based Activity Recognition from Monocular RGB Videos, UbiComp 2019 C1M1E1L1 (IMU) [paper] [my repo]
IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition, IMWUT 2020 C1M1E1L1 (IMU) [paper] [my repo]

Y2021 Jul

Enabling Public Cameras to Talk to the Public, IMWUT 2018 C1M1E1L1 (IMU) [paper] [my repo]
Closing the Gaps in Inertial Motion Tracking, MobiCom 2018 C2M1E1L1 (IMU) [paper] [my repo]
BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging, ECCV 2020 C2M2E1L2 (SCI) [paper] [code] [my repo]
MOTS: Multi-Object Tracking and Segmentation, CVPR 2019 C1M1E1L1 (Tracking) [paper] [website] [my repo]
Towards Real-Time Multi-Object Tracking, ECCV 2020 C3M3E1L1 (Tracking) [paper] [code] [my repo]
MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation, CVPR 2021 C2M2E1L1 (UDA) [paper] [code] [my repo]

Y2021 Jun

CenterNet: Keypoint Triplets for Object Detection, ICCV 2019 C1M1E1L1 (Detection) [paper] [code] [my repo]
Tracking Objects as Points, ECCV 2020 C1M2E1L1 (Tracking) [paper] [code] [my repo]

Y2021 May

Hierarchical Matching of 3D Pedestrian Trajectories for Surveillance Applications, AVSS 2009 C1M2E2L1 [paper] [my repo]
SenseHAR: A Robust Virtual Activity Sensor for Smartphonesand Wearables, SenSys 2019 C2M2E1L2 [paper] [my repo]
ATOM: Accurate Tracking by Overlap Maximization, CVPR 2019 C1M1E1L1 [paper] [code] [my repo]
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking, CVPR 2021 C1M2E1L1 [paper] [code] [my repo]
Dynamic Routing Between Capsules, NIPS 2017 C1M1E1L1 [paper] [video Aurélien] [video Yannic] [video Sara] [code Keras] [code TensorFlow] [code PyTorch] [my repo]
Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition, CHI 2021 C2M3E1L1 [paper] [video] [code] [my repo]

Y2021 Apr

Selective Sensor Fusion for Neural Visual-Inertial Odometry, CVPR 2019 C1M1E1L1 [paper] [my repo]
Combining Passive Visual Cameras and Active IMU Sensors to Track Cooperative People, FUSION 2015 C1M1E1L1 [paper] [my repo]
Things in the air: tagging wearable IoT information on drone videos, IoT 2021 C1M1E1L1 [paper] [my repo]
Who Goes There? Exploiting Silhouettes and Wearable Signals for Subject Identification in Multi-Person Environments, ICCVW 2019 C1M1E1L1 [paper] [my repo]
Visually Fingerprinting Humans without Face Recognition, MobiSys 2015 C1M1E1L1 [paper] [my repo]
Towards City-Scale Smartphone Sensing of Potentially Unsafe Pedestrian Movements, MASS 2014 C1M1E1L1 [paper] [my repo]
Towards Robust Vehicular Context Sensing, TVT 2018 C1M1E1L1 [paper] [my repo]
Recognizing Textures with Mobile Cameras for Pedestrian Safety Applications, TMC 2018 C1M1E1L1 [paper] [my repo]
FusionEye: Perception Sharing for Connected Vehicles and its Bandwidth-Accuracy Trade-offs, SECON 2019 C1M1E1L1 [paper] [my repo]
Gradient Profiling for Pedestrian Services, MobiSys 2015 C1M1E1L1 [paper] [my repo]
LookUp: Enabling Pedestrian Safety Services via Shoe Sensing, MobiSys 2015 C1M1E1L1 [paper] [my repo]
Efficient Speaker Naming via Deep Audio-Face Fusion and End-to-End Attention Model, ACPR 2017 C1M1E1L1 [paper] [my repo]
StructVIO: Visual-Inertial Odometry With Structural Regularity of Man-Made Environments, T-RO 2019 C1M1E1L1 [paper] [my repo]
Atlanta World: An Expectation Maximization Framework for Simultaneous Low-level Edge Grouping and Camera Calibration in Complex Man-made Environments, CVPR 2004 C1M1E1L1 [paper] [my repo]
StructSLAM: Visual SLAM With Building Structure Lines, TVT 2015 C1M1E1L1 [paper] [my repo]
Attention Guided Deep Audio-face Fusion for Efficient Speaker Naming, PR 2018 C1M1E1L1 [paper] [my repo]
TracKlinic: Diagnosis of Challenge Factors in Visual Tracking, WACV 2021 C1M1E1L1 [paper] [my repo]
RGB-D Scene Labeling with Multimodal Recurrent Neural Networks, CVPRW 2017 C1M1E1L1 [paper] [my repo]
Encoding Color Information for Visual Tracking: Algorithms and Benchmark, TIP 2015 C1M1E1L1 [paper] [my repo]
LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking, CVPR 2019 C1M1E1L1 [paper] [my repo]
Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking, ICCV 2017 C1M1E1L1 [paper] [my repo]
Multiple Source Data Fusion via Sparse Representation for Robust Visual Tracking, FUSION 2011 C1M1E1L1 [paper] [my repo]
Robust and Efficient Graph Correspondence Transfer for Person Re-Identification, TIP 2021 C1M1E1L1 [paper] [my repo]
Deep Learning Approach to Fourier Ptychographic Microscopy, Optica 2018 C3M3E1L1 [paper] [my repo]
Deep Speckle Correlation: A Deep Learning Approach Toward Scalable Imaging Through Scattering Media, Optica 2018 C3M2E1L1 [paper] [my repo]
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving, CVPR 2021 C2M2E1L1 [paper] [code] [my repo]
Pointillism: Accurate 3D Bounding Box Estimation with Multi-Radars, SenSys 2020 C2M4E2L1 [paper] [my repo] [my YouTube] [my Bilibili]
Creating Spatio-temporal Spectrum Maps from Sparse Crowdsensed Data, WCNC 2019 C3M2E2L1 [paper] [my repo]
Bringing Old Photos Back to Life, CVPR 2020 C2M2E1L2 [paper] [website] [video] [code] [colab] [my repo]
milliEgo: Single-chip mmWave Radar Aided Egomotion Estimation via Deep Sensor Fusion, SenSys 2020 C3M3E1L2 [paper] [code] [my repo]

Y2021 Mar

Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks, ECCV 2020 C2M2E1L1 [paper] [my repo]
Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case C1M2E1L1 [paper] [my repo]
3D Human Pose Estimation with Spatial and Temporal Transformers C2M2E2L2 [paper] [video] [code] [my repo]
Capturing the Human Figure Through a Wall, TOG 2015 C2M2E1L1 [paper] [my repo]
Widar2.0: Passive Human Tracking with a Single Wi-Fi Link, MobiSys 2018 C1M1E1L1 [paper] [my repo]
Widar: Decimeter-Level Passive Tracking via Velocity Monitoring with Commodity Wi-Fi, Mobihoc 2017 C1M1E1L1 [paper] [my repo]
Position Tracking for Virtual Reality Using Commodity WiFi, CVPR 2017 C2M2E1L2 [paper] [my repo]
Spectrum Patrolling With Crowdsourced Spectrum Sensors, TCCN 2020 C2M1E1L1 [paper] [my repo]
MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing, CVPR 2021 C1M1E1L1 [paper] [code] [my repo]
Annotation Generation From IMU-Based Human Whole-Body Motions in Daily Life Behavior, TransHumMachSyst 2020 C2M2E2L2 [paper] [my repo]
I am a Smartwatch and I can Track my User’s Arm, MobiSys 2016 C2M2E1L2 [paper] [my repo]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021 C3M3E1L2 [paper] [forum] [Yannic Kilcher: video] [The AI Epiphany: [video] [code]] [code] [my repo]
Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking C2M2E1L1 [paper] [my repo]
Ear-AR: Indoor Acoustic Augmented Reality on Earphones, MobiCom 2020 C2M2E2L2 [paper] [my repo]
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications, SP 2020 M1 [paper] [my repo]
Towards 3D Human Pose Construction Using WiFi, MobiCom 2020 C1M1E1L1 [paper] [my repo]
Dense Multimodal Fusion for Hierarchically Joint Representation, ICASSP 2019 C3M2E2L2 [paper] [my repo]
RGB-W: When Vision Meets Wireless, ICCV 2015 C2M2E1L1 [paper] [my repo]
Simultaneous Identification and Tracking of Multiple People using Video and IMUs, CVPRW 2019 C2M2E2L2 [paper] [my repo]
Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs, TIP 2020 C3M3E2L2 [paper] [my repo]
S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency, CoRL 2020 C2M2E1L1 [paper] [website0] [website1] [video] [my repo]
Text-to-Image Generation Grounded by Fine-Grained User Attention, WACV 2021 C1M1E1L1 [paper] [my repo]
Multi-modal Discriminative Model for Vision-and-Language Navigation, ACL 2019 C1M1E1L1 [paper] [my repo]
Hierarchical Self-Attention Network for Action Localization in Videos, ICCV 2019 C2M3E1L1 [paper] [my repo]

Y2021 Feb

Focal Visual-Text Attention for Visual Question Answering, CVPR 2018 C1M1E1L1 [paper] [website] [video] [code] [my repo]
Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes, CoRL 2020 C1M1E1L1 [paper] [website] [video] [my repo]
Layer Normalization C2M2E1L1 [paper] [my repo]
Multi-Modality Cross Attention Network for Image and Sentence Matching, CVPR 2020 C1M2E1L1 [paper] [my repo]
Multimodal Machine Learning: A Survey and Taxonomy, TPAMI 2019 M2 [paper] [my repo]
Stand-Alone Self-Attention in Vision Models, NeurIPS 2019 C1M1E1L1 [paper] [my repo]
Hierarchical Robot Navigation in Novel Environments using Rough 2-D Maps, CoRL 2020 C1M1E1L1 [paper] [website] [video] [my repo]
Multi-modal Transformer for Video Retrieval, ECCV 2020 C1M1E1L1 [paper] [my repo]
AMC: Attention guided Multi-modal Correlation Learning for Image Search, CVPR 2017 C2M1E1L1 [paper] [code] [my repo]
Attention? Attention!, Lil'Log 2 [blog]
The Illustrated Transformer 4 [blog0] [blog1] [blog2] [code]
Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs, CoRL 2020 C1M1E1L1 [paper] [website] [video] [my repo]
View-Invariant Probabilistic Embedding for Human Pose, ECCV 2020 C2M2E1L1 [paper] [website] [code] [my repo]
Dual-modality Seq2Seq Network for Audio-visual Event Localization, ICASSP 2019 C4M4E1L1 [paper] [my repo]
Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition, Sensors 2016 C1M1E1L1 [paper] [my repo]
Attention Is All You Need, NIPS 2017 C2M3E1L1 [paper] [video] [code] [my repo]
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks, T-RO 2020 C1M1E1L1 [paper] [my repo]
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018 C1M1E1L1 [paper] [my repo]
Connecting Vision and Language with Localized Narratives, ECCV 2020 C1M1E1L1 [paper] [website] [video0] [video1] [code] [my repo]
Temporal Cycle-Consistency Learning, CVPR 2019 C2M1E1L1 [paper] [website] [video] [code] [colab] [my repo]
DIRL: Domain-Invariant Representation Learning for Sim-to-Real Transfer, CoRL 2020 C1M1E1L1 [paper] [website] [video] [code] [my repo]
PseudoSeg: Designing Pseudo Labels for Semantic Segmentation, ICLR 2021 C1M1E1L1 [paper] [forum] [code] [my repo]
Deep Multimodal Representation Learning: A Survey, Access 2019 C3M2E1L1 [paper] [my repo]
Deep Multimodal Representation Learning from Temporal Data, CVPR 2017 C3M2E1L1 [paper] [my repo]
A Novel Walking Detection and Step Counting Algorithm Using Unconstrained Smartphones, Sensors 2018 C1M1E1L1 [paper] [my repo]
See, Hear, Explore: Curiosity via Audio-Visual Association, NeurIPS 2020 C3M2E2L1 [paper] [website] [video] [code] [my repo]
Blind Image Quality Evaluation Using Perception Based Features, NCC 2015 C1M1E1L1 [paper] [pip] [my repo]
End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention, ECCV 2020 C3M2E2L1 [paper] [code0] [code1] [my repo]
Collaborative Deep Reinforcement Learning for Multi-Object Tracking, ECCV 2018 C1M1E1L1 [paper] [my repo]
Unsupervised Correlation Analysis, CVPR 2018 C2M2E1L1 [paper] [my repo]

Y2021 Jan

Tracking without bells and whistles, ICCV 2019 C1M1E1L1 [paper] [code] [my repo]
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking C1M1E1L1 [paper] [code] [my repo]
LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking, CVPRW 2020 C3M3E1L3 [note] [paper] [code] [my repo]
Unsupervised Domain Adaptation by Backpropagation, ICML 2015 C1M1E1L1 [paper] [my repo]
IONet: Learning to Cure the Curse of Drift in Inertial Odometry, AAAI 2018 C1M1E1L1 [paper] [my repo]
Deep Neural Network Based Inertial Odometry Using Low-cost Inertial Measurement Units, TMC 2019 C3M3E2L2 [paper] [video] [my repo]
Deep-Learning-Based Pedestrian Inertial Navigation: Methods, Data Set, and On-Device Inference, IoT 2020 C1M1E1L1 [paper] [my repo]
OxIOD: The Dataset for Deep Inertial Odometry, TR 2019 C5M4E1L1 [note] [paper] [dataset] [my repo]
Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging, CVPR 2020 C3M2E1L2 [paper] [code] [my repo]
Rank Minimization for Snapshot Compressive Imaging, TPAMI 2019 C1M1E1L1 [paper] [code] [my repo]
λ-net: Reconstruct Hyperspectral Images from a Snapshot Measurement, ICCV 2019 C3M2E1L2 [paper] [code] [my repo]
End-to-End Learning Framework for IMU-Based 6-DOF Odometry, Sensors 2019 C3M3E2L2 [paper] [my repo]
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL, NeurIPS 2020 C1M1E1L1 [paper] [my repo]
Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach, ICMI 2019 C1M1E1L1 [paper] [my repo]
Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach, ICDE 2019 C1M1E1L1 [paper] [my repo]
One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories, ACMGIS 2005 C1M1E1L1 [paper] [my repo]
AI-IMU Dead-Reckoning, IV 2020 C2M2E1L1 [paper] [code] [my repo]
On Map-Matching Vehicle Tracking Data, VLDB 2005 C1M1E1L1 [paper] [my repo]
EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching, DCOSS 2020 C1M2E1L2 [paper] [my repo]
An Efficiently Computable Metric for Comparing Polygonal Shapes, TPAMI 1991 C1M1E1L1 [paper] [my repo]

2020 Dec

Wi-Go: Accurate and Scalable Vehicle Positioning using WiFi Fine Timing Measurement, MobiSys 2020 [paper] [my repo] (1)
Active Vision for Early Recognition of Human Actions, CVPR 2020 [paper] [my repo] (1)

2020 Nov

Position-aware Graph Neural Networks, ICML 2019 [paper] [my repo] (1)

2020 Oct

Recurrent Space-time Graph Neural Networks, NeurIPS 2019 [paper] [code] [my repo] (1)
3D Graph Neural Networks for RGBD Semantic Segmentation, ICCV 2017 [paper] [my repo] (1)
A Behavioral Approach to Visual Navigation with Graph Localization Networks, RSS 2019 [paper] [blog] [video] [code] [my repo] (1)
Zero-Shot Multi-View Indoor Localization via Graph Location Networks, MM 2020 [paper] [code] [my repo] (1)
SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation, ICCV 2019 [paper] [code] [my repo] (1)
A Comprehensive Survey on Graph Neural Networks, NNLS 2021 [paper] [my repo] (1)
MotionTransformer: Transferring Neural Inertial Tracking between Domains, AAAI 2019 [paper] [my repo] (1)

2020 Sep

Enabling Identity-Aware Tracking via Fusion of Visual and Inertial Features, ICRA 2019 [paper] [my repo] (1)
Person Re-ID by Fusion of Video Silhouettes and Wearable Signals for Home Monitoring Applications, Sensors 2020 [paper] [my repo] (1)
A Survey of Human-Sensing: Methods for Detecting Presence, Count, Location, Track, and Identity, CS 2010 [paper] [my repo] (1)
Automatic Synchronization of Markerless Video and Wearable Sensors for Walking Assessment, Sensors 2019 [paper] [my repo] (1)
Tasking Networked CCTV Cameras and Mobile Phones to Identify and Localize Multiple People, UbiComp 2010 C3M3E3L3 (Vis, IMU, Multi-modal) [paper] [my repo]
Tagging Wearable Accelerometers in Camera Frames through Information Translation between Vision Sensors and Accelerometers, ICCPS 2019 [paper] [my repo] (2)

2020 Jul

Self-Supervised Learning

Self-Supervised Dueling Networks for Deep Reinforcement Learning, NIPS 2016 [paper] [my repo] (1)
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey, TPAMI 2020 [paper] [my repo] (1)
CURL: Contrastive Unsupervised Representations for Reinforcement Learning, ICML 2020 [paper] [code] [my repo] (1)
Unsupervised Domain Adaptation through Self-Supervision - REJECT, ICLR 2020 [paper] [forum] [my repo] (1)
S4L: Self-Supervised Semi-Supervised Learning, ICCV 2019 [paper ICCV] [paper arxiv] [my repo] (1)
Self-Supervised Representation Learning, Lil'Log 2019 [blog] (1)

Prioritized Experience Replay, ICLR 2016 [paper] [my repo] (1)
End-to-End Robotic Reinforcement Learning without Reward Engineering, RSS 2019 [paper] [my repo] (1)
Meta-Sim: Learning to Generate Synthetic Datasets, ICCV 2019 [paper ICCV] [paper arxiv] [blog] [code] [my repo] (1)
SIM2Real Transfer [slide] (1)
VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control [paper] [my repo] (1)
CycleGAN for sim2real Domain Adaptation [paper] [video] [my repo] (1)
Learning to Simulate, ICLR 2019 [paper] [forum] [my repo] (1)
Policy Transfer With Strategy Optimization, ICLR 2019 [paper] [forum] [my repo] (2)
Domain Randomization for Sim2Real Transfer [blog] (1)
RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real, CVPR 2020 [paper CVPR] [paper arxiv] [my repo] (1)
ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation, CVPR 2019 [paper] [video] [code] [my repo] (1)
Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision, CVPR 2020 C3M4E1L2 [paper] [blog] [video] [code] [my repo] (3)
A Visual Guide to Evolution Strategies 2019 [blog] (2)

2020 May

Graph Neural Network

GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation, CVPR 2019 [paper] [my repo] (1)
Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning, ICCV 2019 [paper] [my repo] (1)
Semantic Graph Convolutional Networks for 3D Human Pose Regression, CVPR 2019 [paper] [my repo] (1)
A Convex Relaxation for Multi-Graph Matching, CVPR 2019 [paper] [my repo] (1)
Edge-Labeling Graph Neural Network for Few-shot Learning, CVPR 2019 [paper] [my repo] (1)

2020 Apr

Neural Architecture Search

Neural Architecture Search with Reinforcement Learning, ICLR 2017 [paper] [my repo] (1)
DARTS: Differentiable Architecture Search, ICLR 2019 [paper] [my repo] (2)
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML 2019 C3M3E3L1 [paper] [my repo]
Exploring Randomly Wired Neural Networks for Image Recognition, ICCV 2019 [paper] [my repo] (1)
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures, ICLR 2020 [paper] [my repo] (2)

Flow Models

Glow: Generative Flow with Invertible 1×1 Convolutions, NeurIPS 2018 [paper] [my repo] (1)
C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds, CVPR 2020 [paper] [my repo] (1)
Implicit Generation and Modeling with Energy-Based Models, NeurIPS 2019 [paper] [my repo] (1)
Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One, ICLR 2020 [paper] [my repo] (1)
Do Deep Generative Models Know What They Don't Know?, ICLR 2019 [paper] [my repo] (1)

Optimal Transport

Wasserstein GAN, ICML 2017 [paper] [my repo] (1)
Improved Training of Wasserstein GANs, NeurIPS 2018 [paper] [my repo] (1)
Spectral Normalization for Generative Adversarial Networks, ICLR 2018 [paper] [my repo] (1)

Visual SLAM

ORB-SLAM: a Versatile and Accurate Monocular SLAM System, T-RO 2015 [paper] [my repo] (1)
DTAM: Dense Tracking and Mapping in Real-Time, ICCV 2011 [paper] [my repo] (1)
Learning Meshes for Dense Visual SLAM, ICCV 2019 [paper] [my repo] (1)

Scene Text Recognition

Convolutional Character Networks, ICCV 2019 [paper] [my repo] (1)
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network, CVPR 2020 [paper] [my repo] (1)
Real-time Scene Text Detection with Differentiable Binarization, AAAI 2020 [paper] [my repo] (1)
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection, TIP 2019 [paper] [my repo] (1)

2020 Mar

Hindsight Experience Replay, NIPS 2017 [paper] [video] [my repo] (1)

Self-Supervised Learning

Unsupervised Visual Representation Learning by Context Prediction, ICCV 2015 [paper] [my repo] (1)
Context Encoders: Feature Learning by Inpainting, CVPR 2016 [paper] [my repo] (1)
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification, ECCV 2016 [paper] [my repo] (1)
Learning and Using the Arrow of Time, CVPR 2018 [paper] [my repo] (1)
Grasp2Vec: Learning Object Representations from Self-Supervised Grasping, CoRL 2018 [paper] [my repo] (1)
Evolving Losses for Unsupervised Video Representation Learning, CVPR 2020 [paper] [my repo] (1)

2020 Feb

Cloud Counting

Leveraging Unlabeled Data for Crowd Counting by Learning to Rank, CVPR 2018 [paper] [my repo] (1)
Bayesian Loss for Crowd Count Estimation with Point Supervision, ICCV 2019 [paper] [my repo] (1)

Video Representation

Temporal Gaussian Mixture Layer for Videos, ICML 2019 [paper] [slide] [code] [my repo] (1)
Videos as Space-Time Region Graphs, ECCV 2018 [paper] [my repo] (1)
Non-local Neural Networks, CVPR 2018 [paper] [code] [my repo] (1)
End-to-end Learning of Action Detection from Frame Glimpses in Videos, CVPR 2016 C2M2E1L1 [paper] [blog] [code] [my repo]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, CVPR 2017 [paper] [my repo] (1)

Tracking

Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics, CVPR 2019 [paper] [my repo] (1)
Physical Adversarial Textures That Fool Visual Object Tracking, ICCV 2019 [paper] [my repo] (1)
Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields, CVPR 2019 [paper] [my repo] (1)
Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking, ICCV 2019 [paper] [code] [my repo] (1)
Learning Discriminative Model Prediction for Tracking, ICCV 2019 [paper] [code] [my repo] (1)

Generative Models

SinGAN: Learning a Generative Model from a Single Natural Image, ICCV 2019 [paper] [code] [my repo] (1)
A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019 [paper] [code] [my repo] (1)
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, CVPR 2017 [paper] [blog] [code] [my repo] (1)
Generating Diverse High-Fidelity Images with VQ-VAE-2, NeurIPS 2019 [paper] [my repo] (1)
Autoencoding Beyond Pixels using a Learned Similarity Metric, ICML 2016 [paper] [my repo] (1)
Auto-Encoding Variational Bayes, ICLR 2014 [paper] [video] [my repo] (1)

Contextual Imagined Goals for Self-Supervised Robotic Learning, CoRL 2019 [paper] [blog] [video] [code] [my repo] (1)
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators, CoRL 2019 [paper] [blog] [code] [my repo] (1)
Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning, CoRL 2019 [paper] [blog] [my repo] (1)
Learning Navigation Subroutines from Egocentric Videos, CoRL 2019 [paper] [blog] [video] [my repo] (1)
ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, CoRL 2019 [paper] [blog] [my repo] (1)
RoboNet: Large-Scale Multi-Robot Learning, CoRL 2019 [paper] [blog] [video] [code] [my repo] (1)
Adversarial Active Exploration for Inverse Dynamics Model Learning, CoRL 2019 [paper] [my repo] (1)
Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real, CoRL 2019 [paper] [my repo] (1)
Learning Latent Plans from Play, CoRL 2019 [paper] [blog] [video] [my repo] (1)
To Follow or not to Follow: Selective Imitation Learning from Observations, CoRL 2019 [paper] [blog] [video] [my repo] (1)
Active Domain Randomization, CoRL 2019 [paper] [code] [my repo] (1)
TuneNet: One-Shot Residual Tuning for System Identification and Sim-to-Real Robot Task Transfer, CoRL 2019 [paper] [video] [code] [my repo] (1)
A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots, CoRL 2019 [paper] [code] [my repo] (1)
Asynchronous Methods for Model-Based Reinforcement Learning, CoRL 2019 [paper] [my repo] (1)
Learning to Manipulate Object Collections Using Grounded State Representations, CoRL 2019 [paper] [blog] [code] [video0] [video1] [my repo] (1)
Multi-Frame GAN: Image Enhancement for Stereo Visual Odometry in Low Light, CoRL 2019 [paper] [my repo] (1)
Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection, CVPR 2019 [paper] [my repo] (1)
CRAVES: Controlling Robotic Arm with a Vision-based Economic System, CVPR 2019 [paper] [my repo] (1)
Pyramid Feature Attention Network for Saliency detection, CVPR 2019 [paper] [my repo] (1)
Universal Domain Adaptation, CVPR 2019 [paper] [code] [my repo] (1)
Bidirectional Learning for Domain Adaptation of Semantic Segmentation, CVPR 2019 [paper] [code] [my repo] (1)
Representation Similarity Analysis for Efficient Task taxonomy & Transfer Learning, CVPR 2019 [paper] [code] [my repo] (1)
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks, CVPR 2019 [paper] [video] [my repo] (1)
Gaussian Temporal Awareness Networks for Action Localization, CVPR 2019 [paper] [my repo] (1)
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation, CVPR 2019 [paper] [code] [my repo] (1)

2020 Jan

Temporal Cycle-Consistency Learning, CVPR 2019 [paper] [my repo] (1)
Actor-Critic Instance Segmentation, CVPR 2019 [paper] [code] [my repo] (1)
Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention, CVPR 2019 [paper] [code] [video] [my repo] (1)
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation [paper] [my repo] (1)
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation [paper] [code] [my repo] (1)
Learning to Learn How to Learn: Self-Adaptive Visual Navigation using Meta-Learning [paper] [code] [my repo] (1)
Learning to Navigate Using Mid-Level Visual Priors [paper] [code] [my repo] (1)
Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation [paper] [my repo] (1)
DD-PPO: Learning Near-Perfect PointGoal Navigators From 2.5 Billion Frames [paper] [blog] [my repo] (1)
Continuous Control with Deep Reinforcement Learning [paper] [blog0] [blog1] [code] [my repo] (1)

2019 Dec

Generative Adversarial Nets [paper] [my repo] (1)
Unsupervised Cross-Domain Image Generation [paper] [my repo] (1)

2019 Nov

Visual Semantic Navigation Using Scene Priors [paper] [my repo] (1)
Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning [paper] [my repo] (2)
Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real [paper] [my repo] (1)
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World [paper] [my repo] (1)
On Evaluation of Embodied Navigation Agents [paper] [my repo] (1)
Data-Efficient Hierarchical Reinforcement Learning [paper] [my repo] (2)
Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense [paper] [my repo] (1)
An Empirical Study of Example Forgetting during Deep Neural Network Learning [paper] [my repo] (1)
Deep Hough Voting for 3D Object Detection in Point Clouds [paper] [code] [my repo] (1)
Motion Perception in Reinforcement Learning with Dynamic Objects [paper] [code] [my repo] (1)
Tiny Video Networks [paper] [my repo] (1)
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation [paper] [code] [my repo] (2)
Model-based Behavioral Cloning with Future Image Similarity Learning, CoRL 2019 [paper] [code] [my repo] (3)

2019 Oct

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution, CVPR 2019 [paper] [my repo] (1)
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression, CVPR 2019 [paper] [my repo] (1)
Model-Based Robot Imitation with Future Image Similarity, IJCV 2019 [paper] [my repo] (1)

2019 Sep

Asymmetric Actor Critic for Image-Based Robot Learning, RSS 2018 [paper] [my repo] (1)
Asynchronous Methods for Deep Reinforcement Learning, ICML 2016 [paper] [my repo] (2)
Unsupervised Image-to-Image Translation Networks, NIPS 2017 [paper] [code] [my repo] (1)
Reinforcement Learning With Unsupervised Auxiliary Tasks, ICLR 2017 [paper] [blog] [forum] [my repo] (2)
Sim-to-Real Reinforcement Learning for Deformable Object Manipulation, CoRL 2018 [paper] [my repo] (2)
Sim-to-Real Transfer with Neural-Augmented Robot Simulation, CoRL 2018 [paper] [my repo] (1)
Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks, CVPR 2019 [paper] [my repo] (4)
Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation, CVPR 2019 [paper] [my repo] (3)
Deep Drone Racing: Learning Agile Flight in Dynamic Environments [paper] [my repo] (1)
End to End Learning for Self-Driving Cars [paper] [code0] [code1] [my repo] (4)
Learning to Drive from Simulation without Real World Labels [paper] [blog] [my repo] (3)

2019 Aug

An Integrity Framework for Image-Based Navigation Systems [paper] [my repo] (1)
Forman-Ricci Flow for Change Detection in Large Dynamic Data Sets [paper] [my repo] (1)
Guiding Image Segmentation On The Fly: Interactive Segmentation From A Feedback Control Perspective [paper] [my repo] (1)
Non-Rigid 2D-3D Pose Estimation and 2D Image Segmentation [paper] [my repo] (1)
Robust Physical-World Attacks on Deep Learning Visual Classification [paper] [my repo] (1)
Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression [paper] [my repo] (1)
Learning Real-World Robot Policies by Dreaming [paper] [my repo] (2)

2019 Jul

Learning Latent Super-Events to Detect Multiple Activities in Videos [paper] [code] [my repo] (1)
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures [paper] [my repo] (1)

2019 Apr

Path Aggregation Network for Instance Segmentation [paper] [code] [pwd] [my repo] (1)
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network [paper] [my repo] (1)
PoTion: Pose MoTion Representation for Action Recognition [paper] [my repo] (1)
ContextVP: Fully Context-Aware Video Prediction [paper] [my repo] (1)
Flow-Grounded Spatial-Temporal Video Prediction from Still Images [paper] [my repo] (1)
Two-Stream Convolutional Networks for Action Recognition in Videos [paper] [my repo] (1)
MoCoGAN: Decomposing Motion and Content for Video Generation [paper] [code] [my repo] (1)
FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation [paper] [my repo] (1)
Action Anticipation with RBF Kernelized Feature Mapping RNN [paper] [my repo] (2)

2019 Mar

Decomposing Motion and Content for Natural Video Sequence Prediction [note] [paper] [review] [code] [my slide] [my repo] (3)
Reward Learning from Human Preferences and Demonstrations in Atari [paper] [my repo] (1)
Fast R-CNN [paper] [slide] [code] [my repo] (1)
SDC-Net: Video Prediction Using Spatially-Displaced Convolution [paper] [my repo] (1)
R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms [article] (3)
Automatic Delineation of the Myocardial Wall from CT Images via Shape Segmentation and Variational Region Growing [paper] [my repo] (1)
Active Contours Without Edges [paper] [my repo] (3)

bryanbocao / open-papernotes

open-papernotes

Rubrics from `Y2021`:

Rubrics before `Y2021`:

Recent Notes

About

open-papernotes

Rubrics from Y2021:

Rubrics before Y2021:

Recent Notes

About

Rubrics from `Y2021`:

Rubrics before `Y2021`: