edzer / sp

Classes and methods for spatial data

Home Page:http://edzer.github.io/sp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SpatialMultiPoint ID?

rsbivand opened this issue · comments

This happens in rgeos:

library(maptools)
data(wrld_simpl)
colven <- wrld_simpl[wrld_simpl$NAME %in% c("Colombia", "Venezuela"), ]
col <- colven[colven$NAME == "Colombia", ]
ven <- colven[colven$NAME == "Venezuela", ]
library(raster)
raster <- raster(colven, nrow=100, ncol=100)
set.seed(33)
cells <- sample(ncell(raster), 10)
xy <- xyFromCell(raster, cells)
sp <- SpatialPoints(xy, proj4string=crs(colven))
library(rgeos)
sp <- gIntersection(sp, col)
border <- gIntersection(col, ven)
gDistance(sp, border, byid=TRUE)

(Robert Hijmans)

It is caused by GEOS returning a MultiPoint, which is flattened to a SpatialPoints object. As byid= was FALSE, only one object is to be returned, so all the coordinates get the same rownames() value (obviously wrong). I'm fixing that by setting the rownames() to R_NilValue when the issue is detected. This isn't great - it would be better to return a SpatialMultiPoints object, possibly with an ID associated with each coordinate matrix in the list. Adding a comment to each matrix would be a non-invasive hack, but not pretty (and SFR is trying to back out of my last use of comments). This is an SFR thing too - could you link this issue to SFR?

Setting to R_NilValue broke readWKT for MULTIPOINT objects - they were wrong before (all rownames identical), but are broken in gIsSimple with NULL rownames.

The problem seems to start here:

> sp <- gIntersection(sp, col)
> row.names(sp)
[1] "1" "1" "1" "1"

Yes, I'm much further than that - it's in rgeos_geospoint2crdMat in src/rgeos_coord.c, which assumes that the calling functions have passed through id correctly. They don't do that for MULTIPOINT, and MULTIPOINT can come from many origins (topology operations, readWKT, etc). I don't know why gIsSimple breaks when rownames are NULL, I may look at that too.

rgeos rev 76 May 2010: "Currently point ids are stored as row names which is a design decision that should be revisited later (current implementation works, but probably doesn't integrate well with existing sp tools)."

If rownames are NULL, a GEOMETRYCOLLECTION of POINT objects is created, not a MULTIPOINT. If I don't fix Robert's issue, the # unique rownames is 1, and we get MULTIPOINT, if > 1, a GEOMETRYCOLLECTION of POINT. So to fix Robert's issue we need to find a way to resolve this. One way is to use SpatialMultiPoints and add switch points to suit, another to try to get NULL rownames to work. Yet another is to accept the non-unique rownames (which are accepted elsewhere), but trap their triggering a problem in output needing unique names.

I added a check for ncol and id length, and dropped setting colnames if differs.

Going through gIntersection with byid=FALSE first sets row.names as all "1" (signalling a single returned MultiPoint as implicitly requested by the choice of byid), so triggering a MultiPoint representation on reading, yielding a single MP object internally rather than a collection of 4 points - so gDistance returned a scalar:

> sp <- gIntersection(sp, col)
> row.names(sp)
[1] "1" "1" "1" "1"
> gDistance(sp, border, byid=TRUE)
       [,1]
1 0.5750082
> row.names(sp) <- 1:length(sp)
> gDistance(sp, border, byid=TRUE)
         1        2         3         4
1 2.653788 2.784986 0.5876864 0.5750082

So it's all down to rev 76 and using identical rownames on coordinate matrices:

https://r-forge.r-project.org/scm/viewvc.php/pkg/src/rgeos_R2geos.c?root=rgeos&r1=74&r2=76

Thanks for the hard work & history digging! Why doesn't gDistance set proper row.names, in case sp@coords has NULL dimnames, before doing anything else?

I don't follow - in what context? When sp@coords has NULL rownames, row.names(sp) returns an integer, but returns something. Do you have a case?

> sp <- gIntersection(sp, col)
> row.names(sp)
[1] "1" "1" "1" "1"
> row.names(sp) = 1:4
> gDistance(sp, border, byid=TRUE)
         1        2         3         4
1 2.653788 2.784986 0.5876864 0.5750082

works as intended. Why doesn't gIntersection set proper row.names (seq_len(length(sp))) before returning the object?
And yes, you could argue that row.names for SpatialPoints lies, but that is a different issue: row.names(obj)=row.names(obj) changes obj if dimnames(obj@coords)[[1]] is NULL.

Simple. When geos2R (in C) receives an outgoing GEOS MULTIPOINT object, it flags that class by setting all the rownames to the given value (if a GEOMETRY_COLLECTION of MULTIPOINT objects, each MP gets its own value).

Colin used the same rownames in a SpatialPoints object to show which MP they belonged to. With MP support in rgeos, the R2geos and geos2R processes would not need to use the hack (I think).

If we reset it, you get the:

gIsSimple(readWKT("MULTIPOINT(1 1,2 2,3 3)"))

problem:

> mp <- readWKT("MULTIPOINT(1 1,2 2,3 3)")
> row.names(mp)
[1] "1" "1" "1"
> gIsSimple(mp)
[1] TRUE
> writeWKT(mp)
[1] "MULTIPOINT (1.0000000000000000 1.0000000000000000, 2.0000000000000000 2.0000000000000000, 3.0000000000000000 3.0000000000000000)"
> row.names(mp) <- 1:3
> writeWKT(mp)
[1] "GEOMETRYCOLLECTION (POINT (1.0000000000000000 1.0000000000000000), POINT (2.0000000000000000 2.0000000000000000), POINT (3.0000000000000000 3.0000000000000000))"
> gIsSimple(mp)
Error in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_issimple") : 
  IllegalArgumentException: This method does not support GeometryCollection arguments
> gIsSimple(mp[1])
[1] TRUE
> gcmp <- readWKT("GEOMETRYCOLLECTION (MULTIPOINT(1 1,2 2,3 3), MULTIPOINT(1 0,2 1,3 2))")
> row.names(gcmp)
[1] "1" "1" "1" "2" "2" "2"

which is what stopped me changing the behaviour of geos2R on Saturday, as check and tests failed.