apache / sedona

A cluster computing framework for processing large-scale geospatial data

Home Page:https://sedona.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specify custom transformation parameters/wkt string from CoordinateSystem A til CoordinateSystem B.

robertnagy1 opened this issue · comments

Does Apache Sedona use pyproj for coordinate transformations or does it use other tools?
I don't know if this is possible in Apache Sedona, but I did not see ST_Transform handle custom transformations between coordinate systems. I do not mean custom coordinate systems, i mean custom transformations.

When running function St_transform(geometry, EPSG:From, EPSG:To, Transformation EPSG) I should be able to control which transformation to use between the two datums using a parameter EPSG Code/WKT.

For example:
St_Transform(geometry, (From) 4258, (To)4326, (Using) 1612 )

Why is this important? There are many ways to transform between to geodetic datums and this can have a huge impact on the accuracy of the end coordinates. In this case, if I do not control how the coordinate system transformation happens the accuracy can be tens of meters because there are roughly 36 ways to go from EPSG:4258 to EPSG 4326. This is not exposed to the end user at all. Whilst controlling which transformation to use, in this case 1612, the accuracy will be around 1 meter.

Any way this can be exposed to the end user?

I'm not a Sedona developer, but can provide a few hints. Sedona currently uses GeoTools for coordinate transformations. There is a discussion about replacing GeoTools by Apache SIS at least for licensing reasons (there is also other advantages), but I do not know what would be the timeline.

The problem that you describe is very true, and is one of the spatial referencing complexities that not many developers are aware of. The PROJ and Apache SIS libraries can mitigate this problem by allowing users to specify an area of interest. Those libraries then select the coordinate transformation with a domain of validity having the largest intersection with the specified area. I do not remember if GeoTools has this capability too. But if Sedona used a library having this capability, then ideally Sedona should automatically provide the bounding box of the data (if this information is available cheaply) to the referencing library for allowing it to choose a better transformation. This approach would improve the results without additional effort from user's side.

Alternatively, for more advanced users, Apache SIS (and I think also PROJ) allows them to specify explicitly the EPSG code of the coordinate operation that they want to use. I guess that making such existing capabilities accessible to users may be suggestions for future Sedona evolution.

In the near future, with the evolution of standards such as ISO 19111:2019, providing the CRS will not be sufficient anymore for users who want a centimetric precision. It will also be necessary to specify the temporal epoch. So some evolution of Sedona API will be needed anyway.

Thank you for your input. I agree with what you say, although bounding box would solve some of the issues, we also have a historical/time dimension to it, where maybe currently we have a transformation which is better, but we would like to choose one which was used earlier in time, so we transform using the same definition.

Thank you for the insightful discussion. The Sedona community has realized the importance of a comprehensive CRS lib. We have started to migration to Apache SIS in 1.6.0 release. Hopefully this can be done in 1.6.1 or 1.7.0, which is 3 - 6 month.