depth-estimation depth-image pytorch fpn

Depth-Estimation-PyTorch

Monocular depth estimation using Feature Pyramid Network implemented in PyTorch 1.1.0

To test the model I provide

Change your working directory to model-test.
Unzip the nyuv2.rar to current directory.
Download the trained model from release. Rename to fyn_model.pt and save to current directory.
Run the Jupyter Notebook in test.ipynb.

Note: the model I provide was trained only on a dataset containing 1k images of scenes in basements. The purpose of this model was only to test whether the network architecture works for depth estimation. The test dataset provided in model-test folder also contains only images of basements.

To train your own model

Requirements

Python3
PyTorch (The version in my code is PyTorch 1.1.0)
CUDA (It's almost impossible to train on a CPU because the network is very deep)

Prepare your own dataset

As the dataset is very large, I didn't provide them directly in this repository. However, you can download the depth dataset and load them with your own code.

Modify the code of the data loading part in fyn_main.py first to ensure that you can load the dataset correctly. In my original code, I store the RGB and depth images in two folders and have a pickle file to relate them, both for train dataset and test dataset.

NYU Depth V2 Dataset: This is the dataset I used in my code for the training of the network. If you want to train on this dataset, then you can refer to my script process_dataset.py to process the data you download. Again, after you have processed the data, you need to check the data loading part to see if you can load the dataset correctly.
Training on other datasets: You can also use the architecture to train on other datasets, like KITTI, InteriorNet, MannequinChallenge dataset, which all contains depth map data. You will need to do the data processing and data loading step on your own. Also, although the RGB image size and depth image size might be different from NyuV2, it can still be trained using the same network architecture. Just remember the neural network takes a Width x Height x 3 RGB image as input and ouputs a 1/4 Width x 1/4 Height x 1 grayscale image.

Start training

After you make sure the dataset can be loaded correctly, you can run fyn_main.py to start training.

Architecture

Our architecture is basically a Feature Pyramid Network (FPN) with ResNet101 as backbone.
FPN is an effective backbone for monocular depth estimation because of its ability to extract features and semantics at different scales. It can achieve its potential if guided by proper loss functions.
Two consecutive 3x3 convolutions for feature processing.
ReLU is used as activation function in the last two convolutional layers. No non-linearity in the top-down branch of FPN.
Outputs prediction of size ¼.
Learning rate decay is deployed during training.

Dataset

We use the NYU Depth V2 Dataset for training and testing. The RGB image and the depth image in the dataset are both of size 640x480. During training, the RGB image is loaded as 640x480 and the depth image is loaded and then resized to 160x120. The input of the network is a RGB image with size 640x480, and the output is a grayscale image of size 160x120, which is the depth map we need.

NyuV2 dataset contains images of indoor scenes, which is a limitation for depth estimation on other scenes. You can check other dataset to include more scenes in your model.

Loss Function

We employed a self-defined loss function in our model -- the Gradient Loss:

The gradient of depth maps is obtained by a Sobel filter; the gradient loss is the L1 norm of the difference.

Results

Here are some results on test dataset that contains scenes of basements:

Reference

About

Monocular depth estimation using Feature Pyramid Network implemented in PyTorch 1.1.0

depth-estimation depth-image pytorch fpn

MIT License

Languages

Language:Jupyter Notebook 93.9%Language:Python 6.1%