eblondel / cleangeo

Cleaning geometries from spatial objects in R

Home Page:https://github.com/eblondel/cleangeo/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dependency on retiring spatial infrastructure packages

rsbivand opened this issue · comments

See #26

You will be aware, for example from:
https://r-spatial.org/r/2022/04/12/evolution.html,
https://r-spatial.org/r/2022/12/14/evolution2.html,
https://r-spatial.org/r/2023/04/10/evolution3.html and
https://rsbivand.github.io/csds_jan23/bivand_csds_ssg_230117.pdf and
perhaps view https://www.youtube.com/watch?v=TlpjIqTPMCA&list=PLzREt6r1NenmWEidssmLm-VO_YmAh4pq9&index=1
that rgdal, rgeos and maptools will be retired this
year, in October 2023.

rgeos::gBuffer rgeos::gCrosses rgeos::gDifference rgeos::getScale rgeos::gIntersection rgeos::gIsValid rgeos::gUnionCascaded are detected by pkgapi, there may be more use. All of these can be worked around by coercing to sf and perhaps using st_make_valid; those GEOS facilities were not available when this package began. s2 is a different question, but also relevant in that here one would have to turn off s2 to keep legacy behaviour, and the default precision in sf differs from rgeos. Action is needed urgently, otherwise this package will be archived when rgeos retires.

Dear @rsbivand thanks for this. Indeed i've been using sf recently, including testing various common cases of geometry validity issues, and all were properly solved by sf (cc @edzer). Likely this package will be archived, in the same way sp will. I didn't find time to update this repo but I will probably update README giving pointers to sf st_make_valid which is the preferred way now to validate geometries.

Indeed s2 has to be turned off to keep legacy behaviour. I also raised this question to @edzer on the sf repo because common / basic geoprocessings eg. computing the union of a subset of countries; fails with s2 activated, and works well when s2 is deactivated (and all geometries are valid according to OGC spec). I would expect that although s2 is activated these kind of processing should work. Isn't it?

There are currently no plans to archive sp. For validity on the sphere, see https://r-spatial.org/book/04-Spherical.html#validity-on-the-sphere

I'll include a link to this issue when contacting the micromapST maintainer. Maybe run a deprecation release to alert users with workflows including cleangeo before archiving. So much has changed in vector geometry cleaning, including grid density in the NG GEOS engine, that documentation would be helpful.

@rsbivand @edzer Long story short for cleangeo: I'm aware of the deadline of October. In some other packages I manage, I already tackled the transition to remove dependencies with the above, and I'll try to make the same asap for cleangeo to keep it maintained on CRAN. It's not on top of my priority list now, but I will try to book a timeslot ASAP. In practice (at least to ensure backward compatibility for users using it) I'd like to maintain it on CRAN, otherwise I will release a note saying it will be archived, giving pointers. @rsbivand As you said now so much has changed in vector geometry cleaning, I concur with that. As experience, in UN-FAO projects and EC research projects, we already switched to sf to use sf::st_make_valid when needed.

Regarding s2, I have also to look into that deeper (lack of time), but from the default behaviours, tackling simple cases (and here nothing about special cases on limits of WGS84 - poles or dateline), I have still have doubts on how this is managed, and I want to dig into the s2 specifications. For example (just the last one I had to deal with), making a simple st_union of western african countries (taken from UN source) which have valid geometries should work whether or not s2 is activated or not. It works deactivating it, it doesn't with s2. Referring to standards, if s2 is compliant with the ISO/OGC specifications, I would assume this process is reproducible. It's not.

Regarding its use by default in sf, I'm then still quite puzzled:

  • first because IMHO it is misleading with respect to the named scope of sf which deals with "sf" - so Simple Features - so an ISO/TC211 and OGC standard, although it is not explicitely mentioned in the package documentation. Looking at https://github.com/google/s2geometry I didn't find any pointer or a reference to these standards, but if you have, I would be grateful if you could share it.
  • default behaviour in sf in previous releases, was tightly bound to these standards through implementing libraries (GEOS). The same behavior was reproducible in other programming languages (eg. JTS / Java, GEOS through Postgis/SQL) when fetching data in systems where R is one of the programming language used for some system layers but not all. As I said on sf#2141, the purpose is not to criticize the s2 capability, but to ensure at minimum backward compatibility. IMHO Whenever s2 has added values and should be used, user could switch it on, as sf extension/plugin, moving into s2 geometry handling; but still keeping it FALSE by default.

Looking at s2 Github repo, and its mention of 0.x releases (as it is indicated in their README), it is far to be a stable library. In general, although that i'm a Google services daily user, I'm also puzzled on the longevity of such libraries which by experience with Google tend to change or die quite quickly to be replaced by others , which is nothing comparable to longevity of products like GEOS, JTS, etc. This is the case for Google web-mapping

Still, setting it as default through sf seems to be - at least - premature, especially when systems extensively rely on ISO/OGC for spatial data handling, and the introduction of s2 as default breaks compatibility with ongoing processings. Standardization is not something done for fun, reproducibility and sustainability of the processings are key criteria that are taken into account, and this is also why spatial data handling in R has been progressively 'migrated' from sp to sf. (because sf was built on ISO/OGC standard sf model and brings the missing standard layer in sp).
In case you think this would deserve a live discussion, count me in, i'd be happy to exchange further on this.

Cheers

sf only uses s2 as a default when coordinates are ellipsoidal. The SFA standard never even mentions whether it should be applied to projected coordinates only, or also to ellipsoidal coordinates. But the implicit assumption seems to be: projected, 2D, Cartesian. Lots of things go wrong when you apply 2D (GEOS) routines to ellipsoidal coordinates: distances, areas, buffers, predicates. For instance GeoPandas will warn you, but still do the wrong thing. The only opt-out there is to project. For projected data, sf also uses GEOS.

I wouldn't worry about support from Google: s2geometry powers all their spatial products, including big query GIS, maps and GEE. Their response to issues, triggered e.g. by CRAN checks using unreleased compiler versions, has been very good.

Of course we can remain in the GIS & OGC SFA realm and say the world is flat, but really, it is round. Global actors like Google but also Palantir (who funded PostGIS' geography) or Uber (H3) moved on. I think the rest of us should also move on. But you can of course opt out.

Again, I suspect you misinterpret my saying, the question is not to opt in or to opt out for s2.

If you consider staying in the GIS & OGC realm (which is about standardization before all and not about technology) is to opt out, I don't consider that, personnally I don't pretend to question standards, and much less advance that technical working groups like ISO/TC211 and OGC have a vision that the world is flat and not round (I have to admit i'm a bit "astonished" by your sayings...). Standards have their limits, but standards are revised and evolve.

Yes I think technology should build on standards, if you want to move in a sustainable way, then you should get interest in the standardization processes. Global software actors decide to move on because they shake on innovation, and because it's their business (in the case of Google), but from a consumer perspective (especially data managers, whatever the scale), if you care about reproducibility, sustainability and minimizing dependencies you will look at those that align on standards, and when needed global software actors (companies, or open-source communities) are going to shake to improve standards. This is what main OsGEO members are doing with OGC, they shake it to improve standards, but look how they build software behind: they all rely on standards at the end.

The primary question is whether you consider sf as library that should be use as part of processes for business or not. As for any software, backward compatibility is a minimum requirement, and moving into s2 as default seems to change radically the way spatial data is handled.

(Cheers)

moving into s2 as default seems to change radically the way spatial data is handled.

Yes, but only when working with ellipsoidal coordinates, and for good reasons.

Thanks for not paying attention to all the rest of my comments (evidently you reply on what you wish, but that's your right!),. I also understand that you are not open to have a live discussion, what I suggested above but you didn't reply to this proposal (which is a pity btw). I see you also closed the ticket related to the discussion on s2 setting by default, so I take it that you don't want to discuss at all this.

I keep this ticket open and will report in time to @rsbivand when I will have time to tackle the cleangeo dependencies, what was the initial purpose of Mr Bivand ticket. Regarding s2, I take note he also advised to turnoff s2 to conserve the legacy behavior, which is good (backward compatibility). A pity that same is not applied by sf itself.

Regards

Emmanuel, I'd love to have a live discussion with you, but in person, and not in writing. I think we simply disagree. You admit above that you haven't dug into s2 yet. I'd be happy to continue the discussion when you've done that. I feel continuing this discussion means we repeat our arguments. I closed the discussion at r-spatial/sf#2141 because it wasn't moving, and nobody joined; I also decided (recently) to clean up open issues that didn't move. Feel free to reopen when there are new arguments.

Dear @edzer good to hear :-) not writing, that was what i meant by a live discussion.

I have dig enough to see the s2 has its own data model that is not ISO/OGC based. A geometry that is valid according to ISO/OGC can become invalid according to s2 geometry validity rules (I've mentioned an example here: eblondel/ows4R#106), which is a strong constraint when dealing with web spatial data infrastructures (that in most of cases are all OGC-based). Data that is valid in a OGC-compliant SDI (meaning, stored in datastores like Postgis/GeoPackage, exposed to an OGC data server, exported as standard format like GML/GeoJSON), become invalid once fetched using default sf, so all underlying geoprocessings , that assume valid geometries, fail. In practice, results of these sf geoprocessings are then used to be served back into spatial data infrastructures, OGC-based, so it means we have to push back geometries that are OGC valid.

Ok for the ticket closing strategy, understood.

See #28 (comment) ; which OGC document describes how "OGC valid" is defined for unprojected, ellipsoidal coordinates?

Only three months remain to save this package. Please act now!

Not really now, I have 3 months, you say it :-), I've my priority duties, and cleangeo is not among them.

Yes i took some time yesterday evening to move ahead and push a revision to CRAN. Great to hear it's clean now