apache / sedona

A cluster computing framework for processing large-scale geospatial data

Home Page:https://sedona.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ST_Pixelize small polygon error

ricg72 opened this issue · comments

Expected behavior

ST_Pixelize returns 0 pixels

Actual behavior

ST_Pixelize throw assertion:

Caused by: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at org.apache.spark.sql.sedona_viz.expressions.ST_Pixelize.eval(Pixelize.scala:119)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:160)

Steps to reproduce the problem

case class A(g : String)
val a0 = "3.1" // change a0,a1 to values so an integer point lies between them
val a1 = "3.8"
val d = Seq( A(s"POLYGON (($a0 $a0, $a1 $a0, $a1 $a1, $a0 $a1, $a0 $a0))"))
import spark.implicits._
val df = d.toDS().toDF() .withColumn("geo", expr("ST_GeomFromWKT(g)")) .withColumn("area", expr("ST_Area(geo)"))
df.select("geo", "area").show(false)

val df2 = df .withColumn("px", expr("ST_Pixelize(geo, 10,10, ST_PolygonFromEnvelope(0,0,10,10))")) .show(false)

Settings

Sedona version = 1.5.0
Apache Spark version = 3.3.0
Apache Flink version = ?
API type = Scala
Scala version = 2.12
JRE version = 1.8
Python version = not tested
Environment = Databricks?

@ricg72 ST_Pixelize is not supposed to return 0 pixel. For any geometry (polygons, points, ...), it should return at 1 pixel. There might be something wrong with the logic itself. Do you want to take a stab?

Hi,
what would the spec be ? -- if the object falls within a single pixel then displaying that single pixel is ok - it's the longer thin items that I am not clear how to display - maybe the algorithm could convert the polygon to a line (skeletonize ?) and then draw those pixels ?

I was actually trying to use ST_PIxelize to get all the pixels coordinates in a polygon to pass to RS_Values to get all the pixel values in a polygon. Is there a better way to do this ?

It's important to know which pixel came from where and to control precisely which pixels are inside and out of the polygon (a shift of 0.5 of a coordinate caused problems!)

Depending on what you want to do with the resulting pixel values, a few options:

  1. RS_Clip: Clip/Crop the image by the given geometry
  2. RS_AsRaster: Rasterize a geometry to a raster using a reference raster: . Given two rasters, you can RS_MapAlgebra to perform arbitrary operations on values of two rasters
  3. RS_ZonalStats: calculate the agg values of pixels inside a given geometry

Hi,

thanks for the suggestions - I'll try and test RS_Clip - (the RS_AsRaster / RS_ZonalStats - won't work)
I suspect RS_Clip is going to cause performance issues because there are many geometries per image - the issue is how to prevent the image being read multiple times or being shuffled. Only way to tell is to try!

Another approach might be to update RS_Values to take an array of polygons instead of just an array of points - it would need to return an array[array[pixel values]] so that we could tell which pixel values belong to which geometry.