Variable K in KNN

Question

Variable K in KNN

Mrz-zz opened this issue a year ago · comments

Thank you very much for developing such a great library. Currently, I have a problem that is bothering me.

In KNN, if I want to choose the number of neighbors based on the proportion of nodes in each graph, instead of using a fixed K value, is there an efficient implementation?

I can only split the batch into separate graphs for calculation and then merge the results, but this method is not computationally efficient. So I would like to ask for your opinions on this.

Matthias Fey · Answer 1 · Sun Apr 09 2023 16:59:23 GMT+0800 (China Standard Time)

Yeah, you are right on this one. Currently, we assume fixed k across batches to keep required buffers during computation constant. Changing this would probably require a separate function. An alternative could be to utilize max(k) across all batches, and then filter invalid nodes in a post-processing step, but this may be inefficient as well if your k differs highly across examples.