Trying to predict OpenStreetMap element ID's by using a linear model.
- Shell (tested with Bash)
- R interpreter (tested with R version 2.15.1)
- Enough disk space for daily replication files (about 1.4 GiB / month)
The R scripts rely on a database file which contains dates and the highest ID on these dates. The database is basically a simple CSV file, containing one entry per line. This database is automatically generated by the shell scripts. There's one database file for every data primitive (node, way and relation).
The shell scripts download and analyze the daily replication files on the planet server. The result of the analysis is then added to the database file.
There are currently two scripts available: One for plotting a chart with the date on the X and the highest ID on this date on the Y axis and one for predicting the date when a specific ID will be created. These scripts can be adjusted for the specific needs (e.g. which ID should be predicted).
The database is generated and updated by the run-updater.sh
script. It
checks the current sequence on the planet server by downloading the state file.
If the sequence number on the server is greater than the locally available
replication file it automatically starts downloading the missing replication
files until no newer daily replication is available. After each downloaded
file the replication is analyzed to find the highest ID for each primitive.
The database is then populated with the result.
The shell scripts have some parameters which can be set from outside to change the behaviour of the scripts:
DATA_PATH
: Storage location of the replication and database files.- Default:
.
- Default:
BASE_URL
: URL of the replication files and planet server.- Default:
http://planet.osm.org/replication/day
- Default:
NO_DL
: If set, nothing is downloaded. This is useful, if you already got the replication files from elsewhere and want to analyze the files to populate the database.
env DATA_PATH=/media/somewhere/replication-files ./run-updater.sh
env DATA_PATH=/media/osm/replication/daily NO_DL=1 ./run-updater.sh
env BASE_URL=http://planet-mirror/daily-replications ./run-updater.sh
download-sequence.sh
: Downloads a specific state & replication file.get-latest-sequence.sh
: Outputs the latest sequence number on the server.highest-id.sh
: Outputs the highest ID for a specific primitive & file.run-updater.sh
: Main program script which calls all other scripts.update-db.sh
: Populates the database on a specific sequence number.
linear-prediction.r
: Predicts the date of a specific ID using a linear model.plot.r
: Plots the database.