EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do not limit the elements in VPTree to vectors

albertoandreottiATgmail opened this issue · comments

One of the reasons to work in metric spaces is to abstract away from what the elements you're measuring distances are.
They could be images, text, audio samples, excel spreadsheets.. whatever as long as they come with a distance that defines a metric space. Why are you limiting this to numeric vectors only?
All that you would need is an interface,

MetricDistance {
public double distance(SomeType a, SomeType b);
}

and let the user provide an implementation of that.

  1. the design choice is simply a layover from 8 years ago. I'm currently working on refactoring much of the distance based code (as free time permits).
  2. Most distance metrics are defined on numeric features.
  3. Because JSAT is focused specifically on structured data.

The latter reason is why I will not be implementing any kind of interface as you've requested. A common trick used in most frameworks is that if you need to work with unstructured data, you store it in a separate array and use 1-dimensional vectors. Your custom distance function then grabs the correct unstructured objects based on the index, and computes its distance as desired. You can see this style in use in my LZJD project.