U-Net Training and Testing

Training module for U-Net, including tools for labeling, viewing data, and testing with different hyperparameters.

U-Net
Pipeline
Usage
Configuration
Representation
- Point Cloud Trimming
- Resolution
Labeling
Training Time
Loss
Acknowledgements

U-Net

U-Net is a Convolutional Neural Network architecture built for Biomedical Image Segementation (specifically, segmentation of neuronal structures in electron microscopic stacks).

"The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization."

The precise architecture is shown here:

The official paper for U-Net can be found here.

Pipeline

This module is for the training and testing of the vision node in an autonomous vineyard navigation pipeline.

Code locations for for each node:

VISION
ROW TRACKING
PATH PLANNING (Not yet on Github)
DRIVING (Not yet on Github)

Usage

There are many executable files in this repository. Except for main.py and any configuration files, each file has a multiline docstring at the top, summarizing the file's purpose and giving its usage command.

For the main.py file, which controls training, evalutation, and testing of the network, usage is:

main.py {train, eval, test}

Configuration

This code is built around being able to edit the configuration of the network easily, re-train, re-test, and compare results.

In your base directory, you should have a folder called experiments. This will contain your configurations. Currently, there is just one, called default. The goal is to be able to switch configurations easily, and save results in the folder for whatever configuration you're using.

Each configuration has a config.json file. Edit the hyperparameters for the network in this file.

Representation

It is necessary to convert the raw point clouds into a format that can be input to the network.

The Birds Eye View (BEV) representation was chosen:

Point Cloud Trimming

The point clouds need to be trimmed to ensure all points can fit within the range of the image. This means there must be mins and maxes for X, Y, and Z.

Width represents the X direction of the robot (forward and backwards)
Length represents the Y direction of the robot (side to side)
Height represents the Z direction of the robot (up and down)

These mins and maxes are specified in the geometry hyperparameter in your configuration file.

For example, if you wish to only count points ahead of the robot, you may use a width range of (0.0, 10.0). If you wish to use nearly all points ahead and behind the robot, you may use a width range of (-10.0, 10.0).

Resolution

Point clouds are received as raw point arrays, where as distance increases, so does the sparsity of points. Since we are representing the point clouds in BEV images, each [X, Y, Z] pair must be matched to exactly one pixel (and channel). Therefore, there may be two points that map to the same pixel if their Euclidian distance is small enough.

We can therefore see that resolution may affect the performance of the model. Using larger images and more feature channels, we may increase the accuracy of the model by mapping closer points to separate pixels. Keep in mind the size of the image also affects speed and memory usage.

Labeling

This repository includes an end-to-end labeling pipeline for converting ROS bag files into data that is ready for the network.

Once you have collected bag files that contain point cloud Lidar data, and you have setup your configuration, you are ready to begin labeling:

bag_extractor.py will convert your bag files into individual numpy files each containing a point cloud. In the file, provide a directory where your bags are.
run_labeler.py will create an interactive tool for labeling the data, and save the results in individual numpy files. This will take a while.
view_labels.py will help you visualize the labels you have created, either in 3D or 2D. You may wish to run this a few times during step 2 to ensure you are making your labels correctly.
split_data.py will split the data into training and testing datasets and write csv files containing individual paths to the corresponding point cloud files and label files. In the file, provide the split ratio.

At this point, in your base directory, you should have a folder called 'data', which contains:

raw: a folder containing .npz files that have the raw point clouds generated by step 1.
labels: a folder containing .npz files that have the labels generated by step 2.
train.csv, test.csv: files containing paths to individual groups of raw and labeled data.

Training Time

Training on an NVIDIA RTX 2080 with 2,087 point clouds, with a batch size of 4, each epoch takes around 2.9 mins.

With validation after every epoch, full training to 120 epochs will take around 6-8 hours.

Loss

Currently, the network is trained using only Classification Loss (binary cross-entropy).

However, in the future, I hope to add Embedding Loss, which attempts to increase the separation between classes (in this case, vine or no vine) by applying constrastive loss or center loss to the feature space or the classifier space. This has been shown to decrease the difference between training and testing Classification Loss for aerial images.

For details, see this paper.

Acknowledgements

John Macdonald for Lidar pre-processing, most of the labeling pipeline, and more
Olaf Ronneberger, Philipp Fischer, Thomas Brox for the U-Net Architecture
Jon Binney for numpy_pc2.py

lyp19 / CMU_Lidar_Navigation