alitouka / spark_dbscan

DBSCAN clustering algorithm on top of Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Population of partition index too Expensive

Elbehery opened this issue · comments

@alitouka

I am running the algorithm on a machine with 64G RAM, 16 CPUs . I have 500 MB of data, which is relatively small. However, The algorithm always spend so much time, in Population of partition index step.

Do u have any suggestion regarding this ?

How many dimensions do your data have?

Only two, Latitude & Langtitude