biodiv / anycluster

Server-side clustering of map markers for (Geo)Django

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

number of database queries

clime opened this issue · comments

There is a significant performance hit for making SELECT for each cell in gridCluster and kmeansCluster. Have you thought about reducing it to just one (or a few) queries? I am not completely sure that it is possible but I feel it should be and it would improve performance greatly (especially if you have lots of cells). Have you thought about it? I am looking for a way to do it but I would like to hear from you first what you think.

For the kmeans method, this should be possible and is an interesting idea. One would have to calculate the number of visible cells and then get the number of clusters with k*cellcount. After that, only one SELECT would be needed, targeting the current (grid)bounds instead of each grid cell. Furthermore, this would reduce the amount of times the distance cluster has to be run. I will give that a try. Thank you for this input.

For the gridCluster I currently don't know how the amount of SELECT could be reduced, but that does not mean it is not possible. If you (or anyone else) knows a solution it would be highly appreciated.

I might have found a way querying the database only once by using a grid calculated by a postgis function:
http://gis.stackexchange.com/questions/16374/how-to-create-a-regular-polygon-grid-in-postgis
Hopefully I will find the time to test this.

query amount reduced using temporary tables

Good job. I can't test because i am on travels but good job.