r-spatial / sf

Simple Features for R

Home Page:https://r-spatial.github.io/sf/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to specify which columns from OSM files are read by st_read?

agila5 opened this issue · comments

I have tried to load some data in pbf format downloaded from geofabrik and I want to specify which columns to create (specifically "oneway" and "maxspeed") while sf is reading the file into R.

This is what I have tried with @Robinlovelace (and it works only on Linux but not on windows) but we don't know how to modify the .ini file in a reproducible way or even point GDAL to a different .ini file (https://gdal.org/drivers/vector/osm.html).

See below for "reproducible" example

wy_url = "http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
download.file(wy_url, "wy.osm.pbf")
sf::st_layers("wy.osm.pbf")
roads_country = sf::read_sf("wy.osm.pbf", layer = "lines")
roads_country
sf::read_sf("wy.osm.pbf", layer = "lines", query = "select highway, waterway from lines")

("oneway" and "maxspeed" seem to be missing from your reprex)

Interesting, those seem to be the default columns that GDAL gives from the lines layer:

https://github.com/OSGeo/gdal/blob/86a65322636f7cf15d59cd14de92b7889d150878/gdal/data/osmconf.ini#L38-L48

I guess osmconf.ini needs to be overwritten to access the other columns. Reprex showing that the lit column, for example, is in the .pbf file, but hidden in the other_tags column:

# try with pbf data
wy_url = "http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
download.file(wy_url, "wy.osm.pbf")
sf::st_layers("wy.osm.pbf")
res = sf::st_read("wy.osm.pbf", layer = "lines", stringsAsFactors = FALSE)
names(res)
res$other_tags[1:9]
res = sf::st_read("wy.osm.pbf", layer = "lines", query = "select highway from lines") 
names(res) # worked
res = sf::st_read("wy.osm.pbf", layer = "lines", query = "select lit from lines")
# how to get the lit column?

That results in these outputs:

res$other_tags[1:9]
[1] "\"lit\"=>\"yes\""                                "\"lit\"=>\"yes\""                               
[3] "\"lit\"=>\"yes\""                                "\"lit\"=>\"yes\""                               
[5] "\"lit\"=>\"yes\""                                "\"lit\"=>\"yes\""                               
[7] "\"lit\"=>\"yes\""                                "\"addr:postcode\"=>\"WF3 3HG\""                 
[9] "\"lit\"=>\"yes\",\"addr:postcode\"=>\"WF3 4JJ\""

And, as you alluded to:

res = sf::st_read("wy.osm.pbf", layer = "lines", query = "select lit from lines")
Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  : 
  Query execution failed, cannot open layer.
In addition: Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 1: Unrecognized field name lit.

so maybe the question should be how can the user modify the osmconf.ini file used by st_read() on an ad hoc basis.

Update: changing that line in sudo nvim usr/share/gdal/osmconf.ini to

attributes=name,highway,waterway,aerialway,barrier,man_made,lit,maxspeed,oneway

results in the following:

wy_url = "http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
download.file(wy_url, "wy.osm.pbf")
sf::st_layers("wy.osm.pbf")
#> Driver: OSM 
#> Available layers:
#>         layer_name       geometry_type features fields
#> 1           points               Point       NA     10
#> 2            lines         Line String       NA     12
#> 3 multilinestrings   Multi Line String       NA      4
#> 4    multipolygons       Multi Polygon       NA     25
#> 5  other_relations Geometry Collection       NA      4
res = sf::st_read("wy.osm.pbf", layer = "lines", stringsAsFactors = FALSE)
#> Reading layer `lines' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/atfutures/itsleeds/snet/wy.osm.pbf' using driver `OSM'
#> Simple feature collection with 167998 features and 12 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -2.737682 ymin: 53.28452 xmax: -0.9407968 ymax: 54.14029
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
res
#> Simple feature collection with 167998 features and 12 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -2.737682 ymin: 53.28452 xmax: -0.9407968 ymax: 54.14029
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 10 features:
#>    osm_id                name     highway waterway aerialway barrier
#> 1  779434       Ruskin Avenue residential     <NA>      <NA>    <NA>
#> 2  779437       Barnes Avenue residential     <NA>      <NA>    <NA>
#> 3  779440   Broom Hall Avenue residential     <NA>      <NA>    <NA>
#> 4  779453 Broom Hall Crescent residential     <NA>      <NA>    <NA>
#> 5  779469         Grey Street residential     <NA>      <NA>    <NA>
#> 6  779485         Newton Lane    tertiary     <NA>      <NA>    <NA>
#> 7  779533          Bolus Lane residential     <NA>      <NA>    <NA>
#> 8  779750      Pollard Street     service     <NA>      <NA>    <NA>
#> 9  779785     Lake Lock Grove residential     <NA>      <NA>    <NA>
#> 10 779849         East Street residential     <NA>      <NA>    <NA>
#>    man_made  lit maxspeed oneway z_order                 other_tags
#> 1      <NA>  yes     <NA>   <NA>       3                       <NA>
#> 2      <NA>  yes     <NA>   <NA>       3                       <NA>
#> 3      <NA>  yes     <NA>   <NA>       3                       <NA>
#> 4      <NA>  yes     <NA>   <NA>       3                       <NA>
#> 5      <NA>  yes     <NA>   <NA>       3                       <NA>
#> 6      <NA>  yes     <NA>   <NA>       4                       <NA>
#> 7      <NA>  yes     <NA>   <NA>       3                       <NA>
#> 8      <NA> <NA>     <NA>   <NA>       0 "addr:postcode"=>"WF3 3HG"
#> 9      <NA>  yes     <NA>   <NA>       3 "addr:postcode"=>"WF3 4JJ"
#> 10     <NA>  yes     <NA>   <NA>       3                       <NA>
#>                          geometry
#> 1  LINESTRING (-1.517876 53.70...
#> 2  LINESTRING (-1.513278 53.70...
#> 3  LINESTRING (-1.516386 53.69...
#> 4  LINESTRING (-1.515565 53.69...
#> 5  LINESTRING (-1.504429 53.70...
#> 6  LINESTRING (-1.505074 53.70...
#> 7  LINESTRING (-1.499692 53.70...
#> 8  LINESTRING (-1.499072 53.71...
#> 9  LINESTRING (-1.471199 53.71...
#> 10 LINESTRING (-1.501178 53.69...
table(res$oneway)
#> 
#>          -1 alternating          no  reversible         yes 
#>          74          87        1347          13        9425
table(res$maxspeed)
#> 
#>   1 mph      10  10 mph 100 mph  15 mph  18 mph      20  20 mph  25 mph 
#>       1       3      92      43      88       1       2    2697      76 
#>      30  30 mph   30mph  35 mph  40 mph  45 mph       5   5 mph      50 
#>       4    6496      19      20    2102      13       3      99       2 
#>  50 mph  55 mph      60  60 mph   60mph  65 mph  70 mph  75 mph  80 mph 
#>     505      12       3     840       1       7     884     180       3 
#>  85 mph  90 mph 
#>      13      20

Created on 2019-09-27 by the reprex package (v0.3.0)

Which closes the issue?

That fixes the issue for me, but the solution will be tricky for some users. In the OSM GDAL Driver docs it says

In the data folder of the GDAL distribution, you can find a osmconf.ini file that can be customized to fit your needs. You can also define an alternate path with the OSM_CONFIG_FILE configuration option.

Would it be possible to do that from st_read()?

It would be useful to be able to set that config option, e.g. with something like:

st_read("wy.osm.pbf", layer = "lines", OSM_CONFIG_FILE = "my_osmconf.ini")

or

st_read("wy.osm.pbf", layer = "lines", config = list(OSM_CONFIG_FILE = "my_osmconf.ini"))

based on an example from https://wiki.openstreetmap.org/wiki/OGR

 ogr2ogr -overwrite --config OSM_CONFIG_FILE my_osmconf.ini -skipfailures -f "ESRI Shapefile" charentilly charentilly.osm

See argument options in the st_read docs, and see "Open Options" in the driver docs: use argument options = "CONFIG_FILE=/path/to/my_osmconf.ini" in st_read.

That works 👍

wy_url = "http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
download.file(wy_url, "wy.osm.pbf")
make_ini_attributes = function(x, defaults = c("name", "highway", "waterway", "aerialway", "barrier", "man_made"), append = TRUE) {
    attributes_default_ini = paste0("attributes=", paste(defaults, collapse = ","))
    if(append) {
        x = c(defaults, x)
    } 
    attributes_default_ini_new = paste0("attributes=", paste(x, collapse = ","))
    ini_file = readLines("https://github.com/OSGeo/gdal/raw/master/gdal/data/osmconf.ini")
    sel_attributes = grepl(pattern = attributes_default_ini, x = ini_file)
    message("Old attributes: ", ini_file[sel_attributes])
    message("New attributes: ", attributes_default_ini_new)
    ini_file[sel_attributes] = attributes_default_ini_new
    ini_file
}

ini_new = make_ini_attributes(x = c("oneway", "maxspeed", "foot", "bicycle"))
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,oneway,maxspeed,foot,bicycle
writeLines(ini_new, "ini_new.ini")
res = sf::st_read("wy.osm.pbf", layer = "lines", options = "CONFIG_FILE=ini_new.ini")
#> options:        CONFIG_FILE=ini_new.ini 
#> Reading layer `lines' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/atfutures/repos/stplanr/wy.osm.pbf' using driver `OSM'
#> Simple feature collection with 167998 features and 13 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -2.737682 ymin: 53.28452 xmax: -0.9407968 ymax: 54.14029
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
names(res)
#>  [1] "osm_id"     "name"       "highway"    "waterway"   "aerialway" 
#>  [6] "barrier"    "man_made"   "oneway"     "maxspeed"   "foot"      
#> [11] "bicycle"    "z_order"    "other_tags" "geometry"

Created on 2019-09-27 by the reprex package (v0.3.0)

Many thanks for explanation, really impressed with this functionality.

Great to see this closed and the link to the issue in the documentation. I did search for config in the help page for st_read() and found nothing so may be worth adding a example that is not tested. Could put in a PR with that if useful.

On a different note, and out of interest, do you know whether it's possible for Windows computers to read PBF files. Very grateful if you could point me towards any info on which drivers are or are not likely to be supported on different systems.

From the GDAL driver docs: "The driver is available if GDAL is built with SQLite support and, for .osm XML files, with Expat support." ; from rwinlib/gdal2:

...
  SQLite support:            yes
...
  Expat support:             yes

So that should mean: yes. You can do the same for other drivers.