Awesome-Geo-localization

A Paper List for Geo-localization Research.

Given a single image of streetview / scenery, Geo-localization Task is to predict the actural location (continent / country / region / city / street / geographic coordinates) of the image.

Awesome Papers

Retrieval Based Methods

Retrieval based methods consider Geo-localization as a retrieval task. Generally, treating the input image as a query, the retrieval task is to map the image to the most similar image in a gallery of photos worldwide. Or, we can map the input image to the most similar location in a gallery of locations worldwide.

Title	Venue	Code	Demo
R2former: Unified retrieval and reranking transformer for place recognition	CVPR 2023	Github	-
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization	NeurIPS 2023	Github	-
VIGOR: Cross-View Image Geo-Localization Beyond One-to-One Retrieval	CVPR 2021	Github	-
Cross-view Geo-localization with Layer-to-Layer Transformer	NeurIPS 2021	Github	-
Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching	CVPR 2020	Github	-
Large-Scale Image Geo-Localization Using Dominant Sets	TPAMI 2019	-	-
Revisiting IM2GPS in the Deep Learning Era	ICCV 2017	-	-
IM2GPS: estimating geographic information from a single image	CVPR 2008	-	-

Classification Based Methods

Classification based methods consider Geo-localization as a classification task. The classification method subdivids the earth’s surface into a high number of geo-cells (in different granularities, continent, country, region, city, street, ...) and assigning each input image to one geo-cell.

Title	Venue	Code	Demo
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes	CVPR 2023	Github	-
Rethinking Visual Geo-Localization for Large-Scale Applications	CVPR 2022	Github	-
Where in the World is this Image? Transformer-based Geo-localization in the Wild	ECCV 2022	Github	-
Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification	ECCV 2018	Github	-

Multi-modal Learning Methods for Classification and Generation

With multi-modal learning methods like CLIP (Contrastive Language-Image Pre-training), models can learn the relations between representations of location labels and images, to better predict the accurate location with classification or generation methods.

Title	Venue	Code	Demo
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model	ICML 2024	-	-
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models	arXiv 2024	-	-
Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation	arXiv 2024	-	-
PIGEON: Predicting Image Geolocations	CVPR 2024	Github	-
Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization	arXiv 2023	HuggingFace	HuggingFace
IM2City: image geo-localization via multi-modal learning	GeoAI 2022	-	-
G^3: Geolocation via Guidebook Grounding	EMNLP 2022 Findings	Github	-

Survey

Title	Venue	Code	Demo
Image and Object Geo-Localization	IJCV 2023	-	-

Datasets and Benchmarks

Title	Venue	Code	Demo
OpenStreetView-5M: The Many Roads to Global Visual Geolocation	CVPR 2024	Github	-
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild	CVPR 2024 Workshop	-	-
CityBench: Evaluating the Capabilities of Large Language Model as World Model	arXiv 2024	-	-

Commonly used test sets

Title	Link	Source Link
OpenStreetView-5M	DownloadLink
im2GPS3k	DownloadLink	SourceLink
YFCC4K	DownloadLink
YFCC26K	DownloadLink	SourceLink1 SourceLink2

Awesome Demos

Geospy.ai

SparrowZheyuan18 / Awesome-Geolocalization