Division by zero when including cluster labels
aadharna opened this issue · comments
When using Kreigel et al's original 2d-synthetic dataset, and when including the cluster labels, the result is a divide-by-zero error.
Without the cluster labels, the algorithm runs to completion, but produces the result we talked about last week (slightly too confidant probability values). The two behaviors may be related, but as I am not sure, I thought it better to mention both issues.
@aadharna Thanks for opening this issue! I have determined the root cause of the ZeroDivisionError.
When using PyNomaly, PyNomaly checks to ensure that the number of neighbors specified is less than or equal to the total number of observations. However, when using cluster labels, PyNomaly currently does not check to ensure that the number of neighbors is set to a value less than the smallest cluster size. In this case, the smallest cluster size is 10 (the noise cluster) but the number of neighbors is specified as 20.
I'll plan on introducing this check in the next release, 0.2.2, and will suggest to users to run clusters separately if wanting to specify a different number of nearest neighbors for each cluster. Thanks again for bringing this to my attention!
Addressed in 0.2.3. A UserWarning
is now provided when this occurs.