apache / sedona

A cluster computing framework for processing large-scale geospatial data

Home Page:https://sedona.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ST_Pixelize drawing polygon perimeter rather than all pixels in polygon

ricg72 opened this issue · comments

Expected behavior

drawing 10 x 10 polygon expect 100 pixels

+---------------------------------------+--------+-----+------------+--------------+
|geo                                    |size(px)|Is4_4|st_area(geo)|st_length(geo)|
+---------------------------------------+--------+-----+------------+--------------+
|POLYGON ((1 1, 11 1, 11 11, 1 11, 1 1))|100     |true |100.0       |40.0          |
+---------------------------------------+--------+-----+------------+--------------+

Actual behavior

drawing 10 x 10 polygon only getting 40, on the boundary

output - size(px) is wrong

+---------------------------------------+--------+-----+------------+--------------+
|geo                                    |size(px)|Is4_4|st_area(geo)|st_length(geo)|
+---------------------------------------+--------+-----+------------+--------------+
|POLYGON ((1 1, 11 1, 11 11, 1 11, 1 1))|36      |true |100.0       |40.0          |
+---------------------------------------+--------+-----+------------+--------------+

Steps to reproduce the problem

case class A(g : String)
val a0 = 1.0
val a1 = 11.0
val d = Seq( A(s"POLYGON (($a0 $a0, $a1 $a0, $a1 $a1, $a0 $a1, $a0 $a0))"))
import spark.implicits._
val df = d.toDS().toDF()
val df2 = df
  .withColumn("geo",   expr("ST_GeomFromWKT(g)"))
  .withColumn("Is4_4", expr("ST_Contains(geo, ST_Point(4,4))")) // should be in
  .withColumn("px",    expr("ST_Pixelize(geo, 10,10, ST_PolygonFromEnvelope(0,0,12,12))")) // make sure (a0,a0) to (a1,a1)
  
println( s"Number of pixels: ${df2.select( explode(col("px")) ).count()} - would expect ${(a1-a0) * (a1-a0)} ")

df2.select( col("geo"), size(col("px")), col("Is4_4"), expr("ST_Area(geo)"), expr("ST_Length(geo)")).show(false)

df2  .select( explode(col("px")) )  .show(false)

Settings

Sedona version = 1.5.0
Apache Spark version = 3.3.0
Apache Flink version = ?
API type = Scala,
Scala version = 2.12
JRE version = 1.8
Python version = ?
Environment = Databricks

suspect problems is :
https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_viz/expressions/Pixelize.scala
lines53-55
case geometry: Polygon => {
RasterizationUtils.FindPixelCoordinates(resolutionX, resolutionY, boundary, inputGeometry.asInstanceOf[Polygon], reverseCoordinate)
}

adding an extra argument to the call of 1.0 calls a different method which gets the interior pxiels ?

Getting the interior pixels needs some sorts of sweep line algorithms to scan the entire geometry. This was not implemented.

Hi,
I think there is already a method that will get all the pixels in a polygon - https://github.com/apache/sedona/blob/master/spark/common/src/main/java/org/apache/sedona/viz/utils/RasterizationUtils.java#L214

public static List<Tuple2<Pixel, Double>> FindPixelCoordinates(int resolutionX, int resolutionY, Envelope datasetBoundary, Polygon spatialObject, boolean reverseSpatialCoordinate, Double objectWeight)

I agree it's not the most efficient !