Raising ValueError erroneously
garethclews opened this issue · comments
I have installed apricot 0.2.3 from pypi and following the example in the README throws a ValueError.
import numpy
from apricot import FacilityLocationSelection
X = numpy.random.normal(100, 1, size=(1000, 25))
X_subset = FacilityLocationSelection(100).fit_transform(X)
results in:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-15da598d409a> in <module>()
...snip...
~/.pyenv/versions/3.6.4/lib/python3.6/site-packages/apricot/base.py in fit(self, X, y)
108 raise ValueError("X must have exactly two dimensions.")
109 if numpy.min(X) < 0.0 and numpy.max(X) > 0.:
--> 110 raise ValueError("X cannot contain negative values or must be entirely "\
111 "negative values.")
112
ValueError: X cannot contain negative values or must be entirely negative values.
but numpy.min(X) < 0.0 and numpy.max(X) > 0.
returns False for the X generated.
System information:
- Mac OS 10.13.6
- Python 3.6.4
- numpy 1.14.5
- numba 0.39.0
Hi @karetsu
Thanks for reporting the bug. It cropped up in a small optimization I tried to do. When you're using facility location functions, it's not the data set that needs to be non-negative (or entirely negative), it's the pairwise similarity (whereas with feature based methods the data themselves need to be stringly non-negative). The default similarity for facility location is Euclidean distance, of which the distance between a point and itself is 0 in theory, or positive floating point precision in implementation, which is annoying. I resolved the issue by subtracting out the maximal value (typically around machine precision) along the diagonal when using euclidean distance as the similarity. Try getting the latest code (0.2.4 on PyPI) and letting me know if that fixed the issue.
Please re-open if you encounter this issue again.