Autonomous Driving Papers with Code

Papers and code collection about autonomous driving

A Famous Repo for Autonomous Driving
Awesome Autonomous Driving
Trajectory Prediction
- 19-ICCV-The Trajectron: Probabilistic Multi-Agent Trajectory Modeling with Dynamic Spatiotemporal Graphs, [pdf]
- 19-ICCV-Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction, [pdf]
- 19-ICCV-PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings, [pdf], [project]
- 19-AAAI-oral-TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents, [pdf], [pytorch code]
  - 提出 instance layer来建模实例之间的运动和相互影响；用category layer来建模同类别实例之间的相似性。
  - 用统一的模型预测汽车、人、自行车的轨迹。
- 19-CVPR-precognition workshop Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs, [pdf], [pytorch code]_
- 18-CVPR-Social GAN Socially Acceptable Trajectories with GANs, [pdf], [pytorch code]
  - 把trajectory prediction看成点列生成问题，所以用GAN-based architecture。
  - 在top k confident predictions用L2 loss, 增加variety。
- 18-CVPR Trajnet Workshop -Convolutional Social Pooling for Vehicle Trajectory Prediction, [pdf]
  - Conv层+pooling层来代替social lstm里面的social pooling。方法是把lstm输出的hidden feature放在一个tensor里，tensor的列数是lane的个数，行数是输入总高度/单个car的长度，深度数是hidden dimension。
- 16-CVPR-Social LSTM Human Trajectory Prediction in Crowded Spaces, [pdf], [pytorch code]
  - 提出social pooling对空间上相近的人的关系建模，方法是在人工设置大小的grid里，把同一个grid的hidden feature累加。
3D Tracking/Object Detection/Segmentation/Depth Estimation
- 19-ICCV-Deep HoughVoting for 3D Object Detection in Point Clouds, [pdf]
- 19-ICCV-Joint Monocular 3D Vehicle Detection and Tracking, [pdf], [pytorch code]
- 19-ICCV-oral-Deep Hough Voting for 3D Object Detection in Point Clouds, [pdf]
- 19-ICCV-PU-GAN: a Point Cloud Upsampling Adversarial Network, [pdf]
- 19-CVPR-ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving, [pdf]
- 19-CVPR-Stereo R-CNN based 3D Object Detection for Autonomous Driving, [pdf], [pytorch code]
- 19-arXiv-A Baseline for 3D Multi-Object Tracking, [pdf], [code]
  - SOTA for 3D Multi-Object Tracking on KITTI
- 19-CVPR-PointPillars: Fast Encoders for Object Detection from Point Clouds, [PDF], [pytorch code]
- 19-CVPR-LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving, [pdf]
- 19-CVPR-Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, [pdf], [pytorch code]
- 19-ICCV-3D-RelNet: Joint Object and Relational Network for 3D Prediction, [pdf], [pytorch code]
- 19-ICCV-3D Point Cloud Learning for Large-scale Environment Analysis and Place Recognition, [pdf]
- 19-ICCV-oral-Can GCNs Go as Deep as CNNs? [pdf], [tensorflow code], [pytorch code]
  - 把CNN的residual/dense connection和dilated convolutions用在一个56的深层GCN，应用在point clond semantic segmentation
- 18-arXivComplex-YOLO: Real-time 3D Object Detection on Point Clouds, [pdf], [pytorch code]
- 17-CVPR-Multi-View 3D Object Detection Network for Autonomous Driving, [pdf], [tensorflow code]
- 17-arXiv-SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud, [pdf], [tensorflow code]
Action Recognition/Prediction
- A famous repo for action recognition: [Awesome Action Recognition]
- 19-ICCV-oral-SlowFast Networks for Video Recognition, [pdf], (code will be available)
- 19-CVPR-Time-Conditioned Action Anticipation in One Shot, [pdf]
- 19-CVPR-oral-Relational Action Forecasting, [pdf]
- 19-CVPR-Peeking into the Future: Predicting Future Person Activities and Locations in Videos, [pdf], [tensorflow code]
  - 一个利用videos做行人的action prediction和trajectory prediction的统一模型。
  - 通过object detection，person key-point detection, scene segmentation, bounding boxes of objects and persons 的预训练模型（除了最后一个）来分别提appearance，motion，person-scene interaction, person-object interaction的visual feature。
- 19-ICCV-Exploring the Limitations of Behavior Cloning for Autonomous Driving [pdf], [python code]
- 18-ECCV-Action Anticipation By Predicting Future Dynamic Images, [pdf]
  - 用dynamic images的重建L2 loss, dynamic images的分类loss， RGB frame的重建L2 loss训练
- 19-arXiv-Temporal Recurrent Networks for Online Action Detection, [pdf], [pytorch code]
  - It jointly models the historical and future temporal context under the constraint of the online setting
- 17-ICCV-Online Real-time Multiple Spatiotemporal Action Localisation and Prediction, [pdf], [pytorch code]
  - real-time SSD (Single Shot MultiBox Detector) CNNs to regress and classify detection boxes in each video frame potentially containing an action of interest
  - propose an online algorithm to incrementally construct and label "action tubes" from the SSD frame level detections
- 17-ICCV-Temporal Action Detection with Structured Segment Networks, [pdf], [pytorch code]
- 17-ICCV-Encouraging LSTMs to Anticipate Actions Very Early, [pdf], [theano+keras code]
  - 使用了CNN提取visual feature (context), 使用Class Activation Map提取motion feature
Video Prediction/Generation
- 18-ICML-Hierarchical Long-term Video Prediction without Supervision, [pdf], [tensorflow code]
- 18-arXiv-Stochastic Adversarial Video Prediction, [pdf], [tensorflow code]
- 18-arXiv-Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning, [pdf], [keras code]
- 18-NeurlPS-Learning to Decompose and Disentangle Representations for Video Prediction, [pdf], [pytorch+pyro code]
- 16-arXiv-Learning a Driving Simulator, [pdf], [tensorflow code]
- 16-arXiv-Deep Multi-Scale Video Prediction Beyond Mean Square Error, [pdf], [tensorflow code]
Dataset
- [HDD] 18-CVPR. [pdf]. A dataset for (own) driving scene understanding. Nearly 104 hours of 137 driving sessions in the San Francisco Bay Area. The dataset was collected from a vehicle with a front-facing camera, and includes frame-level annotations of 11 goal-oriented actions (e.g., intersection passing, left turn, right turn, etc.) The dataset also includes readings from a variety of non-visual sensors collected by the instrumented vehicle’s Controller Area Network (CAN bus).
- [KITTI] Tasks of interest are: stereo evaluation, optical flow evaluation, depth estimation, visual odometry, 3D object detection and 3D tracking, semantic segmentation
- [APOLLO Scape] Scene Parsing ,Car Instance,Lane Segmentation,Self Localization,Trajectory, Detection/Tracking, Stereo
- [nuScenes] The first large-scale dataset to provide data from the entire sensor suite of an autonomous vehicle (6 cameras, 1 LIDAR, 5 RADAR, GPS, IMU). The goal of nuScenes is to look at the entire sensor suite. The full dataset includes approximately 1.4M camera images, 390k LIDAR sweeps, 1.4M RADAR sweeps and 1.4M object bounding boxes in 40k keyframes.
- [Caltech Lanes] The archive below inlucdes 1225 individual frames as taken from a camera mounted on Alice in addition to the labeled lanes. The dataset is divided into four individual clips: cordova1 with 250 frames, cordova2 with 406 frames, washington1 with 337 frames, and washington2 with 232 frames.
- [Virtual KITTI] 2D/3D object detection, multi-object tracking
- [Berkeley DeepDrive] Object detection, instance segmentation ,drivable decision, lane marking. Explore 100,000 HD video sequences of over 1,100-hour driving experience across many different times in the day, weather conditions, and driving scenarios. Our video sequences also include GPS locations, IMU data, and timestamps.
- [VIENA2] Synthetic driving data for driving manoeuvre, accidents, pedestrian intentions and front car intentions. 15K HD videos with frame size of 1920x1280, corresponding to 2.25M annotated frames. Each video contains 150 frames captured at 30fps depicting a single action from one scenario.
Challenge
- [19-CVPR WAD] Workshop on AD
Miscellaneous
- Paper with Code for Autonomous Driving and Self-Driving Cars
- [MMAcition]
- [3D ResNet PyTorch] Video Classification using 3D ResNet
- [PyTorch Video Research] A unified framework for Action classification, Action localization, Spatial Action localization, Inpainting, Video Alignment, Triplet Classification.

ChaofanTao / autonomous_driving_papers

Autonomous Driving Papers with Code

About