mibe / osm-id-prediction

Trying to predict OpenStreetMap element ID's

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

osm-id-prediction

Trying to predict OpenStreetMap element ID's by using a linear model.

Requirements

  • Shell (tested with Bash)
  • R interpreter (tested with R version 2.15.1)
  • Enough disk space for daily replication files (about 1.4 GiB / month)

Details

The R scripts rely on a database file which contains dates and the highest ID on these dates. The database is basically a simple CSV file, containing one entry per line. This database is automatically generated by the shell scripts. There's one database file for every data primitive (node, way and relation).

The shell scripts download and analyze the daily replication files on the planet server. The result of the analysis is then added to the database file.

R scripts

There are currently two scripts available: One for plotting a chart with the date on the X and the highest ID on this date on the Y axis and one for predicting the date when a specific ID will be created. These scripts can be adjusted for the specific needs (e.g. which ID should be predicted).

Database

The database is generated and updated by the run-updater.sh script. It checks the current sequence on the planet server by downloading the state file. If the sequence number on the server is greater than the locally available replication file it automatically starts downloading the missing replication files until no newer daily replication is available. After each downloaded file the replication is analyzed to find the highest ID for each primitive. The database is then populated with the result.

Parameters

The shell scripts have some parameters which can be set from outside to change the behaviour of the scripts:

  • DATA_PATH: Storage location of the replication and database files.
    • Default: .
  • BASE_URL: URL of the replication files and planet server.
    • Default: http://planet.osm.org/replication/day
  • NO_DL: If set, nothing is downloaded. This is useful, if you already got the replication files from elsewhere and want to analyze the files to populate the database.

Usage examples:

env DATA_PATH=/media/somewhere/replication-files ./run-updater.sh env DATA_PATH=/media/osm/replication/daily NO_DL=1 ./run-updater.sh env BASE_URL=http://planet-mirror/daily-replications ./run-updater.sh

Files

Shell scripts

  • download-sequence.sh: Downloads a specific state & replication file.
  • get-latest-sequence.sh: Outputs the latest sequence number on the server.
  • highest-id.sh: Outputs the highest ID for a specific primitive & file.
  • run-updater.sh: Main program script which calls all other scripts.
  • update-db.sh: Populates the database on a specific sequence number.

R scripts

  • linear-prediction.r: Predicts the date of a specific ID using a linear model.
  • plot.r: Plots the database.

About

Trying to predict OpenStreetMap element ID's


Languages

Language:Shell 73.0%Language:R 27.0%