vc1492a / PyNomaly

Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Division by zero when including cluster labels

aadharna opened this issue · comments

When using Kreigel et al's original 2d-synthetic dataset, and when including the cluster labels, the result is a divide-by-zero error.

screen shot 2018-07-30 at 9 12 02 am

screen shot 2018-07-30 at 9 11 47 am

Without the cluster labels, the algorithm runs to completion, but produces the result we talked about last week (slightly too confidant probability values). The two behaviors may be related, but as I am not sure, I thought it better to mention both issues.

@aadharna Thanks for opening this issue! I have determined the root cause of the ZeroDivisionError.
When using PyNomaly, PyNomaly checks to ensure that the number of neighbors specified is less than or equal to the total number of observations. However, when using cluster labels, PyNomaly currently does not check to ensure that the number of neighbors is set to a value less than the smallest cluster size. In this case, the smallest cluster size is 10 (the noise cluster) but the number of neighbors is specified as 20.

I'll plan on introducing this check in the next release, 0.2.2, and will suggest to users to run clusters separately if wanting to specify a different number of nearest neighbors for each cluster. Thanks again for bringing this to my attention!

Addressed in 0.2.3. A UserWarning is now provided when this occurs.