This repository contains the implementation of a visual localization pipeline for city-scale environments. The project explores various approaches to visual localization and evaluates their pros and cons to build the most suitable city-scale visual localization pipeline. The main objective of this solution is to robustly and accurately estimate the camera position based on a single image and camera parameters that are usually automatically attached to the image through EXIF.
- Robust and accurate camera position estimation based on a single image and camera parameters.
- Flexibility and extensibility to accommodate the growth of the reference image database without the need for retraining.
- Handy tools for collecting and processing geotagged datasets for visual localization based on
Google Street View API
.
src/geonavpy
Python Package: Contains the algorithms and modules for the visual localization pipeline.bin/
Directory: Includes scripts for generating the reference database and running experiments.
- Clone the repository:
git clone https://github.com/Tsapiv/visual-localization-pipeline.git
cd visual-localization-pipeline
- Install the required dependencies:
pip install poetry
poetry install
- Run experiments:
python bin/run_experiments.py --reference_set <reference_set_path> --descriptor_type <descriptor_type> --query_set <query_set_path> --exp_name <experiment_name> --conf <config_file_path>
--reference_set
: Path to the reference dataset.--descriptor_type
: Type of global descriptor used in retrieval (default:radenovic_gldv1
).--query_set
: Path to the query dataset.--exp_name
: Experiment name (optional).--conf
: Path to the config file.
-
Generate reference database:
4.1. Preview location and adjust spacing using OpenStreetMap API
python bin/preview.py [-h] (--point POINT POINT | --place PLACE) [--radius RADIUS] [--spacing SPACING] [--jitter LOWER_JITTER UPPER_JITTER] [-v]
To generate coordinates around a specific point:
python bin/preview.py --point 40.7128 -74.0060 --radius 1000 --spacing 50 -v
To generate coordinates around a specific city:
python bin/preview.py --place New York --radius 2000 --jitter 1 2
4.2. Download metadata of actual panoramas using Google Street View API (free)
python bin/download/metadata.py [-h] --database DATABASE [--credentials CREDENTIALS] (--point POINT POINT | --place PLACE) [--jitter LOWER_JITTER UPPER_JITTER] [--radius RADIUS] [--spacing SPACING] [-v]
Use the same parameters as in the "bin/preview.py" script
4.3. Generate html map with positions of actual panoramas
python bin/postview.py [-h] --database DATABASE [--credentials CREDENTIALS] [--zoom ZOOM] [--output OUTPUT]
Use the same "database" parameter as in the "bin/download/metadata.py" script
4.4. Update metadata files with altitude info using Google Elevation API (priced)
python bin/download/elevation.py [-h] --database DATABASE [--credentials CREDENTIALS]
4.5. Download panoramas and update metadata files using Google Street View API (priced)
python bin/download/panoramas.py [-h] --database DATABASE [--credentials CREDENTIALS] [--fov FOV] [--n_directions N_DIRECTIONS]
4.6. Precompute global feature descriptors based on panoramas for image retrieval
python bin/generate_global_descriptors.py [-h] --database DATABASE [--descriptor_type DESCRIPTOR_TYPE] [--backbone BACKBONE] [--device DEVICE]
4.7. Precompute local feature descriptors based on panoramas for feature matching
python bin/generate_local_descriptors.py [-h] --database DATABASE [--max_keypoints MAX_KEYPOINTS] [--keypoint_threshold KEYPOINT_THRESHOLD] [--nms_radius NMS_RADIUS] [--device DEVICE]
.
└── <database name>
├── <database entry uid 1>
│ ├── image.jpg
│ ├── metadata.json
│ │ ├── "w": int,
│ │ ├── "h": int,
│ │ ├── "lat": Optional[float], # latitude
│ │ ├── "lng": Optional[float], # longitude
│ │ ├── "alt": Optional[float], # altitude
│ │ ├── "azn": Optional[float], # azimuth
│ │ ├── "fov": Optional[float], # camera's field of view
│ │ ├── "K": 3x3 float matrix # intrinsic camera calibration
│ │ └── "E": Optional[4x4 float matrix] # extrinsic camera calibration
│ ├── [<optional model specifier prefix>]keypoints.npy
│ ├── <model specifier prefix 1>_descriptor.npy
│ ├── <model specifier prefix 2>_descriptor.npy
│ ├── ....
│ └── <model specifier prefix k>_descriptor.npz
├── <database entry uid 2>
├── <database entry uid 3>
├── <database entry uid 4>
├── ....
└── <database entry uid n>
Contributions to enhance the functionality and performance of the visual localization pipeline are welcome. If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.
Image retrieval: Deep Visual Geo-localization Benchmark
Image reranking: Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective
Feature matching: SuperPoint & SuperGlue
This project is licensed under the MIT License. See the LICENSE file for more information.
For any questions or inquiries, please open issue.