OpenSextant / Gazetteer

The OpenSextant Gazetteer is a collection of world-wide place name data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenSextant Gazetteer

The OpenSextant Gazetteer is a collection of world-wide place name data. It's used by the other OpenSextant projects, like the OpenSextant geotagger but it is also useful for any project that needs clean, consistent place name data.

Released Data

Most projects will just want the finished data (Latest Release)

Do it yourself

OpenSextant use Kettle ETL to process and transform the publicly available gazetteer data into a clean consistent form used by the OpenSextant geotagger.

  • Get and install Kettle
    • Get it from http://kettle.pentaho.com (Tested with versions "4.4.0-stable" and 6.1 )
      NOTE: Kettle "5.0.1.A-stable" introduced an intermittent issue reading the Excel files used for reference data. Avoid for now. Also 6.0.x introduced a bug for the User Defined Java step. Likewise avoid for now. Version 6.1 generates a large number of inscrutable but harmless warning messages in the log: Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
  • Configure
    • copy or rename build.local.properties to build.properties and edit:
      • set the "kettle.home" parameter to where you installed Kettle from step #1 above
      • set the proxy.host and proxy.port parameters if you are behind a firewall
      • set the "NGA_date" and "USGS_date" parameters (see build.properties for details)
      • (optional) modify the "kettle.options.jvm" setting to increase/reduce memory used in gazetteer processing. Setting this below about 1G will cause excessive processing times.
      • if using a Kettle version < 6.1 uncomment line in copy-libs task to use common codecs
  • Do the build
    • run ant
      This will fetch the data from the two websites (NGA and USGS) and run the Kettle processes which will clean, transform and zip the finished gazetteer data into the build directory.

Depending on your machine this whole process can take up to 1.5 hrs.

About

The OpenSextant Gazetteer is a collection of world-wide place name data