ST_Pixelize small polygon error

Question

ST_Pixelize small polygon error

ricg72 opened this issue 3 months ago · comments

ricg72 commented 3 months ago

Expected behavior

ST_Pixelize returns 0 pixels

Actual behavior

ST_Pixelize throw assertion:

Caused by: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at org.apache.spark.sql.sedona_viz.expressions.ST_Pixelize.eval(Pixelize.scala:119)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:160)

Steps to reproduce the problem

case class A(g : String)
val a0 = "3.1" // change a0,a1 to values so an integer point lies between them
val a1 = "3.8"
val d = Seq( A(s"POLYGON (($a0 $a0, $a1 $a0, $a1 $a1, $a0 $a1, $a0 $a0))"))
import spark.implicits._
val df = d.toDS().toDF() .withColumn("geo", expr("ST_GeomFromWKT(g)")) .withColumn("area", expr("ST_Area(geo)"))
df.select("geo", "area").show(false)

val df2 = df .withColumn("px", expr("ST_Pixelize(geo, 10,10, ST_PolygonFromEnvelope(0,0,10,10))")) .show(false)

Settings

Sedona version = 1.5.0
Apache Spark version = 3.3.0
Apache Flink version = ?
API type = Scala
Scala version = 2.12
JRE version = 1.8
Python version = not tested
Environment = Databricks?

Jia Yu commented 3 months ago

@ricg72

ricg72 · Answer 1 · Sat Mar 09 2024 22:33:44 GMT+0800 (China Standard Time)

suspect the asset https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_viz/expressions/Pixelize.scala line 119 assert(pixels.size() > 0)

Jia Yu · Answer 2 · Sun Mar 10 2024 04:27:52 GMT+0800 (China Standard Time)

@ricg72 ST_Pixelize is not supposed to return 0 pixel. For any geometry (polygons, points, ...), it should return at 1 pixel. There might be something wrong with the logic itself. Do you want to take a stab?

ricg72 · Answer 3 · Sun Mar 10 2024 16:58:26 GMT+0800 (China Standard Time)

Hi,
what would the spec be ? -- if the object falls within a single pixel then displaying that single pixel is ok - it's the longer thin items that I am not clear how to display - maybe the algorithm could convert the polygon to a line (skeletonize ?) and then draw those pixels ?

I was actually trying to use ST_PIxelize to get all the pixels coordinates in a polygon to pass to RS_Values to get all the pixel values in a polygon. Is there a better way to do this ?

It's important to know which pixel came from where and to control precisely which pixels are inside and out of the polygon (a shift of 0.5 of a coordinate caused problems!)

Jia Yu · Answer 4 · Mon Mar 11 2024 12:00:00 GMT+0800 (China Standard Time)

Depending on what you want to do with the resulting pixel values, a few options:

RS_Clip: Clip/Crop the image by the given geometry
RS_AsRaster: Rasterize a geometry to a raster using a reference raster: . Given two rasters, you can RS_MapAlgebra to perform arbitrary operations on values of two rasters
RS_ZonalStats: calculate the agg values of pixels inside a given geometry

ricg72 · Answer 5 · Wed Mar 13 2024 18:21:36 GMT+0800 (China Standard Time)

Hi,

thanks for the suggestions - I'll try and test RS_Clip - (the RS_AsRaster / RS_ZonalStats - won't work)
I suspect RS_Clip is going to cause performance issues because there are many geometries per image - the issue is how to prevent the image being read multiple times or being shuffled. Only way to tell is to try!

Another approach might be to update RS_Values to take an array of polygons instead of just an array of points - it would need to return an array[array[pixel values]] so that we could tell which pixel values belong to which geometry.