XIVO: X Inertial-aided Visual Odometry and Sparse Mapping

XIVO runs at 140FPS on stored data (here from a RealSense D435i sensor) or on live streams with latency of around 1-7ms, depending on the hardware. It takes as input video frames from a calibrated camera and inertial measurements from an IMU, and outputs a sparse point cloud with attribute features and 6 DOF pose of the camera. It performs auto-calibration of the relative pose between the camera and the IMU as well as the time-stamp alignment. More demos are available here, the aproach is described in this paper. XIVO does not perform post-mortem refinement (bundle adjustment, pose graph optimization), but that can be easily added as post-processing.

Overview

XIVO is an open-source repository for visual-inertial odometry/mapping. It is a simplified version of Corvis [Jones et al.,Tsotsos et al.], designed for pedagogical purposes, and incorporates odometry (relative motion of the sensor platform), local mapping (pose relative to a reference frame of the oldest visible features), and global mapping (pose relative to a global frame, including loop-closure and global re-localization — this feature, present in Corvis, is not yet incorporated in XIVO).

Corvis is optimized for speed, running at 200FPS on a commodity laptop computer, whereas XIVO prioritizes readability and runs at 140FPS. XIVO incorporates most of the core features of Corvis, including 3D structure in the state, serving as short-term memory; it performs auto-calibration (pose of the camera relative to the IMU, and time-stamp shift). It requires the camera to have calibrated intrinsics, which can be obtained using any open-source package such as OpenCV prior to using Corvis or XIVO. Corvis and XIVO require time-stamps, which can be done through the ROS drivers. Please refer to the ROS message interfaces (imu,image) for details on how to format the data for real-time use.

We provide several recorded sequences, and the ability to run XIVO off-line in batch mode for comparison with other methods. Note that some of these methods operate in a non-causal fashion, by performing batch optimization relative to keyframes, or in a sliding window mode, introducing capture latency. XIVO is causal, and processes only the last image frame received. The latency of a vision update (time interval between the instant of capture and the time where a state update is performed) is about 7ms, depending on the hardware used. Updates based on inertial measurements depends on the integration scheme, and is about 1ms for the default selection.

Corvis has been developed since 2005 [Jones et al.], with contributors including Eagle Jones [ijrr11], Konstantine Tsotsos [icra15], and Xiaohan Fei [cvpr17,eccv18,icra19]. If you use this code, or any the datasets provided, please acknowledge it by citing [Fei et al.].

While the ‘map’ produced by SLAM, consisting of a sparse set of attributed point features, is only functional to localization, with the attributes sufficient for detection in an image, XIVO has been used as a building block for semantic mapping [Dong et al.,Fei et al.], where the scene is populated by objects, bounded by dense surfaces. Research code for semantic mapping using XIVO can be found here.

Background and a bit of history

The first public demonstration of real-time visual odometry (Structure From Motion, or SFM) on commercial off-the-shelf hardware was given by Jin et al. [cvpr00] at CVPR 2000. Its use for visual augmentation (augmented reality) was demonstrated at ICCV 2001 [Favaro et al.], and ECCV 2002, where a virtual object was inserted in live video from a hand-held camera connected to a desktop PC. While SFM and SLAM are sometimes considered different, they are equivalent if structure is represented in the state and stored for later re-localization. This feature has been present in the work above since 2004 using feature group parametrizations, first introduced by Favaro et al. [iccv01]. Later public demonstrations of real-time visual odometry/SLAM include Davison [iccv03] and Nister et al. [cvpr04].

Corvis is based on the analysis of Jones-Vedaldi-Soatto [jones07] and was first demonstrated in 2008. The journal version of the paper describing the system was submitted in 2009 and published in 2011 [ijrr11]. It differed from contemporaneous approaches using the original MSCKF [Mourikis & Roumeliotis, 2007] in that it incorporated structure in the state of the filter, serving both as a reference for scale - not present in the original MSCKF - as well as a memory that enables mapping and re-localization, making it a hybrid odometry/mapping solution. One can of course also include in the model out-of-state feature constraints, in the manner introduced in the Essential Filter [Soatto 1994], or the MSCKF. The manner in which the Gauge transformation is handled is fundamentally different in Corvis and MSCKF: In the former, there is no uncertainty associated to Gauge transformations, since they just reflect an arbitrary choice of reference. In the latter, uncertainty grows over time.

XIVO builds on Corvis, has features in the state and can incorporate out-of-state constraints and loop-closure, represents features in co-visibile groups, as in Favaro et al. [iccv01], and includes auto-calibration as in Jones et al. [jones07]. XIVO was also part of the first visual-inertial-semantic mapping system first presented by Dong et al. [cvpr17] in 2016. Background material on SFM can be found in textbooks.

Requirements

This software is built and tested on Ubuntu 16.04 and 18.04 with compiler g++ 7.4.0. Porting to different platforms is relatively easy but not addressed in this repository.

Dependencies

OpenCV: Feature detection and tracking.
Eigen: Linear algebra.
Pangolin: Lightweight visualization.
glog: Logging.
gflags: Command-line options.
jsoncpp: Configuration.
(optional) googletest: Unit tests.
(optional) g2o: To use pose graph optimization.
(optional) ROS: To use in live mode with ROS.
(optional) pybind11: Python binding.

Dependencies are included in the thirdparty directory.

Build

To build, in Ubuntu 18.04, execute the build.sh script in the root directory of the project.

For detailed usage of the software, see the wiki.

License and Disclaimer

This software is property of the UC Regents, and is provided free of charge for research purposes only. It comes with no warranties, expressed or implied, according to these terms and conditions. For commercial use, please contact UCLA TDG.

Acknowledgment

If you make use of any part of this code or the datasets provided, please acknowledge this repository by citing the following:

@article{fei2019geo,
  title={Geo-supervised visual depth prediction},
  author={Fei, Xiaohan and Wong, Alex and Soatto, Stefano},
  journal={IEEE Robotics and Automation Letters},
  volume={4},
  number={2},
  pages={1661--1668},
  year={2019},
  publisher={IEEE}
}

PETGreen / xivo