Using in PySpark

Question

Using in PySpark

sabman opened this issue 8 years ago · comments

Any thoughts on how we could use the library in PySpark?
I was thinking of at least in the case of SpatialJoins changing the Class to work with wkt and wkb string instead of only JTS Geometry objects. This should allow us to write something like:

jvm = sc._jvm

from shapely.geometry import Polygon, Point

rectangleA = Polygon([(0, 0), (0, 10), (10, 10), (10, 0)])
rectangleB = Polygon([(-4, -4), (-4, 4), (4, 4), (4, -4)])
rectangleC = Polygon([(7, 7), (7, 8), (8, 8), (8, 7)])
pointD = Point((1, -1))

def geomABWithId():
  return sc.parallelize([
    (0L, rectangleA.wkt),
    (1L, rectangleB.wkt)
  ])

def geomCWithId():
  return sc.parallelize([(0L, rectangleC.wkt)])

def geomABCWithId():
  return sc.parallelize([
  (0L, rectangleA.wkt),
  (1L, rectangleB.wkt),
  (2L, rectangleC.wkt)])

def geomDWithId():
  return sc.parallelize(
    [(0L, pointD.wkt)]
  )


predicate = jvm.spatialspark.operator.SpatialOperator.Within()
jvm.spatialspark.join.BroadcastSpatialJoin(sc, 
  geomABWithId(), geomABCWithId(), predicate).collect()

thoughts?

syoummer · Answer 1 · Sun May 15 2016 00:07:36 GMT+0800 (China Standard Time)

I don't have much experience with pyspark or even python. I may need to spend some time to make it work. My current focus will still be the core part, which are the implementations on top of DataFrame/Dataset APIs.

Shoaib Burq · Answer 2 · Tue May 17 2016 04:23:58 GMT+0800 (China Standard Time)

OK cool! I can help do the Python API I started working on it. Today I tried to hack around a bit to just see if I can make it run. And it works pretty well! https://github.com/sabman/SpatialSpark/tree/python-hack I'll close this for now. Let's connect on gitter