Official repository of the paper "Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry"
Monocular visual odometry consists of the estimation of the position of an agent through images of a single camera, and it is applied in autonomous vehicles, medical robots, and augmented reality. However, monocular systems suffer from the scale ambiguity problem due to the lack of depth information in 2D frames. This paper contributes by showing an application of the dense prediction transformer model for scale estimation in monocular visual odometry systems. Experimental results show that the scale drift problem of monocular systems can be reduced through the accurate estimation of the depth map by this model, achieving competitive state-of-the-art performance on a visual odometry benchmark.
Download the KITTI odometry dataset (grayscale).
In this work, we use the .jpg
format. You can convert the dataset to .jpg
format with png_to_jpg.py.
Create a simbolic link (Windows) or a softlink (Linux) to the dataset in the dataset
folder:
- On Windows:
mklink /D <path_to_your_project>\DPT-VO\dataset <path_to_your_downloaded_dataset>
- On Linux:
ln -s <path_to_your_downloaded_dataset> <path_to_your_project>/DPT-VO/dataset
Then, the data structure should be as follows:
|---DPT-VO
|---dataset
|---sequences_jpg
|---00
|---image_0
|---000000.png
|---000001.png
|---...
|---image_1
|...
|---image_2
|---...
|---image_3
|---...
|---01
|---...
Download the DPT trained weights and save it in the weights
folder.
For more details please check the original DPT repository.
- Create a virtual environment using Anaconda and activate it:
conda create -n dpt-vo python==3.8.0
conda activate dpt-vo
- Install dependencies (with environment activated):
pip install -r requirements.txt
Run the main.py
code with the following command:
python main.py -s <sequence_number>
You can also use a different path to dataset by changing the arguments --data_path
and --pose_path
:
python main.py -d <path_to_dataset> -p <path_to_gt_poses> -s <sequence_number>
The evalutaion is done with the KITTI odometry evaluation toolbox. Please go to the evaluation repository to see more details about the evaluation metrics and how to run the toolbox.
Please cite our paper if you find this research useful in your work:
@INPROCEEDINGS{Francani2022,
title={Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry},
author={André O. Françani and Marcos R. O. A. Maximo},
booktitle={2022 Latin American Robotics Symposium (LARS), 2022 Brazilian Symposium on Robotics (SBR), and 2022 Workshop on Robotics in Education (WRE)},
days={18-21},
month={oct},
year={2022},
}
Some of the functions were borrowed and adapted from three amazing works, which are: DPT, DF-VO, and monoVO.