IIT-PAVIS / Flatlandia

Dataset and Baselines for "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Flatlandia Dataset

Introduction

video_intro.webm

We introduce the Flatlandia dataset, a novel problem for visual localization from object detections and annotated object maps: given a visual in which common urban objects (e.g., benches, streetlights, signs) are detected, and given a 2D map of the area, annotated with the location of similar urban objects, we want to recover the location of the visual query on the map, expressed as a 2D location (latitude/longitude) and an angle (orientation).

Solving these problems would allow to better exploit the wide availability of open urban maps annotated with GPS locations of common objects (e.g., via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we designed a novel dataset for 3DoF visual localization, based on the crowd-sourced data available in Mapillary for five European cities.

The code in this repository is part of the paper:
"You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset."
Matteo Toso, Matteo Taiana, Stuart James and Alessio Del Bue.
arXiv preprint arXiv:2304.06373 (2023).

video_dataset.webm

The Flatlandia data set is published under MIT license.

If you use this code in your research, please acknowledge it as:

@inproceedings{toso2023you,
Title = {You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset},
Author = {Toso, Matteo and Taiana, Matteo and James, Stuart and Del Bue, Alessio},
booktitle = {arXiv preprint arXiv:2304.06373},
Year = {2023},
}

Project set up

We developed Flatlandia using the Ubuntu operative system, but we expect that it can be run on other operating systems.

Set up the Conda environment

git clone git@github.com:IIT-PAVIS/Flatlandia
cd Flatlandia
conda create -n flatlandia python=3.7
conda activate flatlandia
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install matplotlib mapillary 
conda install -c dglteam dgl-cuda11.6 
conda install wandb --channel conda-forge

Set up Mapillary access

The visual queries used in the Flatlandia dataset were obtained from, and belong to, Mapillary. To visualize them, we rely on the official Mapillary API; follow the instruction provided at Mapillary API to obtain an access token. This token then has to be added to scripts/utils/common.py under mapillary_access_token.

The Flatlandia dataset

video_vienna.webm

The Flatlandia dataset provides a series of visual queries sampled from crowd-sourced street-level Mapillary images, each annotated with a set of object detections (2D bounding boxes and class labels). These queries are sampled from 20 areas across Europe, and for each area we provide a reference map: a 2D map with the location (latitude and longitude) and class of the objects present in the scene. We here provide an example of a reference map in Vienna (Left), a visual query present in it (Top Right), and a zoomed-in, camera-centric map with only the objects observed in the query. example_plot

The core Flatlandia dataset is stored in json format under data/flatlandia.json, and can easily be accessed with a torch dataloader:

from scripts.utils.dataloader import FlatlandiaLoader 
dataset = FlatlandiaLoader()
for problem in dataset:
    ...

Each dataset entry is a json file containing:

  • reference_map: the id of the Flatlandia scene (integer in the range 0-19),
  • reference_xy: the latitude and longitude of each object in the reference map
  • reference_class: the class label of each object, encoded as an integer (see scripts/dataloader.id_to_scene for conversion)
  • query_token: the unique Mapillary token associated with the visual query,
  • query_xy: the location of the camera in the reference map
  • query_theta: the orientation of the camera
  • query_matches: the index of the detected objects in the list of reference map objects
  • query_detections: location of the detections on the image, as the top left and bottom right corners of a bounding box
  • intrinsics: intrinsic parameters of the camera that acquired the visual query

Each dataset entry can be visualized, as in the above image, using the function visualize_problem(x) defined in scripts.utils.dataloader.

Additional content

In addition to the Flatlandia dataset, we provide:

  • SfM reconstructions of the Flatlandia scenes (data/README.MD)
  • Code exemplifying the use of the dataset (scripts/README.MD)

Acknowledgement

This code was developed as part of the MEMEX project, and has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 870743.

About

Dataset and Baselines for "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset"

License:MIT License


Languages

Language:Python 100.0%