TanYufei / spabert

Code for EMNLP22 SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation

This repo contains code for SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation which was published in EMNLP 2022. SpaBERT provides a general-purpose geo-entity representation based on neighboring entities in geospatial data. SpaBERT extends BERT to capture linearized spatial context, while incorporating a spatial coordinate embedding mechanism to preserve spatial relations of entities in the 2-dimensional space. SpaBERT is pretrained with masked language modeling and masked entity prediction tasks to learn spatial dependencies.

Pretraining

Pretrained model weights can be downloaded from the Google Drive for SpaBERT-base and SpaBERT-large.

Weights can also obtained from training from scratch using the following sample code. Data for pretraining can be downloaded here.

  • Code to pretrain SpaBERT-base model:

    python3 train_mlm.py --lr=5e-5 --sep_between_neighbors --bert_option='bert-base'

  • Code to pretrain SpaBERT-large model:

    python3 train_mlm.py --lr=1e-6 --sep_between_neighbors --bert_option='bert-large

Downstream Tasks

Supervised Geo-entity typing

The goal is to predict a geo-entity’s semantic type (e.g., transportation and healthcare) given the target geo-entity name and spatial context (i.e. surrounding neighbors name and location).

Models trained on OSM in London and California region can be downloaded from Google Drive for SpaBERT-base and SpaBERT-large

Data used for training and testing can be downloaded here

  • Sample code for training SpaBERT-base typing model
python3 train_cls_spatialbert.py --lr=5e-5 --sep_between_neighbors --bert_option='bert-base'  --with_type --mlm_checkpoint_path='mlm_mem_keeppos_ep0_iter06000_0.2936.pth' 
  • Sample code for training SpaBERT-large typing model
python3 train_cls_spatialbert.py --lr=1e-6 --sep_between_neighbors --bert_option='bert-large'  --with_type --mlm_checkpoint_path='mlm_mem_keeppos_ep1_iter02000_0.4400.pth' --epochs=20

Unsupervised Geo-entity Linking

Geo-entity linking is to link geo-entities from a geographic information system (GIS) oriented dataset to a knowledge base (KB). This task unsupervised thus does not require any further training. Pretrained models can be directly used for this task.

Linking with SpaBERT-base

python3 unsupervised_wiki_location_allcand.py --model_name='spatial_bert-base' --sep_between_neighbors \
 --spatial_bert_weight_dir='weights/' --spatial_bert_weight_name='mlm_mem_keeppos_ep0_iter06000_0.2936.pth'

Linking with SpaBERT-large

python3 unsupervised_wiki_location_allcand.py --model_name='spatial_bert-large' --sep_between_neighbors \
 --spatial_bert_weight_dir='weights/' --spatial_bert_weight_name='mlm_mem_keeppos_ep1_iter02000_0.4400.pth'

Data used for linking from USGS historical maps to WikiData KB is provided here

Acknowledgement

@article{li2022spabert,
  title={SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation},
  author={Zekun Li, Jina Kim, Yao-Yi Chiang and Muhao Chen},
  journal={EMNLP},
  year={2022}
}

About

Code for EMNLP22 SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation.


Languages

Language:Python 91.1%Language:Jupyter Notebook 8.9%