Does this project have the R-tree feature currently?

Question

Does this project have the R-tree feature currently?

ChenZhongPu opened this issue 9 years ago · comments

For some operations, building R-tree index for MBR is much efficient, and does this support it ?

syoummer · Answer 1 · Wed Mar 18 2015 10:59:36 GMT+0800 (China Standard Time)

I have a R-tree implementation using Spark but seems not very efficient based on my benchmark, so I decided to remove it at current phase.

zhongpu · Answer 2 · Wed Mar 18 2015 12:22:40 GMT+0800 (China Standard Time)

Can I directly bulk load data and build index using JTS , and save the index into file for storage ? more see at http://stackoverflow.com/questions/29113702/strtree-in-jts-topology-suite-bulk-load-data-and-build-index.

zhongpu · Answer 3 · Wed Mar 18 2015 15:12:11 GMT+0800 (China Standard Time)

Here is your code in BroadcastSpaticalJoin.scala:

 //create R-tree on right dataset
    val strtree = new STRtree()
    val rightGeometryWithIdLocal = rightGeometryWithId.collect()
    rightGeometryWithIdLocal.foreach(x => {val y = x._2.getEnvelopeInternal; y.expandBy(radius); strtree.insert(y, x)})
    val rtreeBroadcast = sc.broadcast(strtree)
    leftGeometryWithId.flatMap(x => queryRtree(rtreeBroadcast, x._1, x._2, joinPredicate, radius))

If the right dataset is big enough, can it (strtree ) fill in memory well ?

I am not much know about parallel computing. Does RDD operation has such magic power to parallel it automatically?

syoummer · Answer 4 · Wed Mar 18 2015 20:33:40 GMT+0800 (China Standard Time)

The assumption for broadcast based join is the right dataset fits in memory, which is introduced in our tech. report. If it is not the case, partition based join is the solution.

zhongpu · Answer 5 · Thu Mar 19 2015 00:32:56 GMT+0800 (China Standard Time)

As my second question,

Can I directly bulk load data and build index using JTS , 

and save the index into file for storage ? more see at 

http://stackoverflow.com/questions/29113702/strtree-in-jts-topology-suite-bulk-load-data-and-build-index.

It seems that R-tree in JTS do the bulk loading when query method is called. Therefore, saving the strtree object into file for future use seems making no sense.Right ?

syoummer · Answer 6 · Thu Mar 19 2015 00:55:19 GMT+0800 (China Standard Time)

it depends.

From my understanding, you are trying to use JTS to bulk load an R-tree for very large dataset, which I think is not feasible. As I mentioned to you, I have an R-tree implementation without JTS on Spark for such purpose but the performance is not very good. Now I am thinking about implementing an R-tree structure similar to spatialhadoop. I have already implemented several components but recently I have no time to finish it.