Keep unique id of a point

Question

Keep unique id of a point

lucaventurini opened this issue 9 years ago · comments

Let's say we want to cluster some objects on a subset of their features. We then transform these objects into Points, where the said subset will become the coordinates of the Points. We want to keep track of the remainder of the features not used for the algorithm. How do we proceed?

A possible solution is to keep track of each point by means of a unique identifier. In the current source code I see such a field, but, even if I force it to a value, it is magically transformed in some point of the pre-partioning, and at the end of the algorithm all the identifiers are different, so that no join with the initial dataset is possible.

I think this is a critical issue, if confimed. Joining the result of a clustering with some metadata is the most, if not only, useful postprocessing part to make something of the results of DBSCAN (and any clustering algorithm).

Marco Gaido · Answer 1 · Wed Dec 09 2015 19:05:24 GMT+0800 (China Standard Time)

At the moment there is no way to attach metadata to the points. The only available way is to perform a join at the end using the coordinates...

If you want to attach metadata, you need to create a proper field in the Point object and refactor the code, since Points are re-created several times in the algorithm...

Luca Venturini · Answer 2 · Wed Dec 09 2015 20:09:33 GMT+0800 (China Standard Time)

I see your point, but a full join is an expensive operation that could be saved if only the id could be kept.
So you confirm the id is refactored during the runs on purpose?

Marco Gaido · Answer 3 · Wed Dec 09 2015 21:47:05 GMT+0800 (China Standard Time)

Yes, the pointId you can see there is for internal processing and no metadata storage is allowed so far