ROS U-Net Node for Autonomous Lidar Navigation

Read raw point clouds and extract lines using a Convolutional Neural Network called U-Net!

Details
U-Net
Pipeline
Usage
Dataset
Training
Acknowledgements

Details

The code is separated into three sections for each Lidar frame:

First, the point cloud goes through pre-processing, where it is converted from an unordered array of [X, Y, Z] pairs to a [400 x 400] image with 24 channels (to represent the height of each pixel).
Next, the [1, 24, 400, 400] Tensor is passed through the trained U-Net model, which outputs a prediction map.
Lastly, the prediction map goes through post-processing, where a Probabalistic Hough Transform is applied, followed by clustering and selection.

These final lines are then published via the /unet_lines ROS topic.

U-Net

U-Net is a Convolutional Neural Network architecture built for Biomedical Image Segementation (specifically, segmentation of neuronal structures in electron microscopic stacks).

"The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization."

The precise architecture is shown here:

The official paper for U-Net can be found here.

Pipeline

This node is part of a larger autonomous navigation pipeline that is currently being developed.

This node is the VISION node represented in the full pipeline below:

Each node is a separate ROS node, each receiving and publishing relevant data.

Code locations for the other nodes are listed below:

ROW TRACKING
PATH PLANNING
DRIVING (Not yet on Github)

Usage

It's very easy to get this node up and running! Simply complete these steps:

Replace the first line of main.py with your Python interpreter path
Add in the path to your trained state dictionary here
If you wish to save images of the point clouds or prediction maps, add in paths
Make sure you have all necessary packages installed
Start op your roscore and run:

rosrun cmu_unet main.py

Dataset

A sizeable dataset containing point clouds and masks (ground truths) is required for training.

Python scripts for extracting point clouds from bags and saving them as .np files are provided here.

A custom labeler for labeling the point clouds with ground truth lines is also provided in the code above.

Training

The code for training, labeling, and testing loss functions is available here.

Check out that repository for details on training with different loss functions.

Point Cloud Trimming

In pre-processing, the point clouds need to be trimmed to ensure all points can fit within the [24, 400, 400] image. This means there must be minimums and maximums for width, length, and height.

Width represents the X direction of the robot (forward and backwards)
Length represents the Y direction of the robot (side to side)
Height represents the Z direction of the robot (up and down)

These mins and maxes are specified as tuples in main.py and should be adjusted for your use.

For example, if you wish to only count points ahead of the robot, you may use a width range of (0.0, 10.0). If you wish to use nearly all points ahead and behind the robot, you may use a width range of (-10.0, 10.0).

Resolution

Point clouds are received as raw point arrays, where as distance increases, so does the sparsity of points. Since we are representing the point clouds in images in the Birds Eye View representation, each [X, Y, Z] pair must be matched to exactly one pixel (and channel). Therefore, there may be two points that map to the same pixel if their Cartesian coordinates are close enough.

We can therefore see that resolution may affect the performance of the model. Using larger images (say [600, 600, 32]), we may increase the accuracy of the model by mapping closer points to separate pixels. Keep in mind the size of the image also affects the speed and memory usage of the model.

Depending on your Point Cloud Trimming, you may wish to decrease or increase the size of the model. For example, if you only use a small area around the robot, it may only be necessary to use a small image size (say, [256, 256, 18]).

The pixel mapping is done here, in pre-processing.

Training Time

Training on an NVIDIA RTX 2080 with 2,087 point clouds, with a batch size of 4, each epoch takes around 2.9 mins.

Acknowledgements

John Macdonald for Lidar pre-processing and labeling
Olaf Ronneberger, Philipp Fischer, Thomas Brox for the U-Net Architecture
Jon Binney for numpy_pc2.py

lyp19 / CMU_UNet_Node