Predict the probability of testing data

Question

Predict the probability of testing data

zhaohan-xi opened this issue 2 years ago · comments

Hi Dear Author, I wonder does this package contains an API to do independent testing after fitting? For instance, something like:

m = loop.LocalOutlierProbability(data).fit()

scores_of_test_data = m.local_outlier_probabilities(test_data)

where the "data" is used for training (fitting) and "test_data" is another np.array that is for testing only, by which we want to know whether the "test_data" is the outlier for training "data", while we don't put them together for fitting (because fitting again every time takes a long time).

Does this package have such an API?

Valentino Constantinou · Answer 1 · Wed Jun 29 2022 05:19:26 GMT+0800 (China Standard Time)

Hi @HarrialX, good question.

The original Local Outlier Probability (LoOP) approach was never intended as anything but an unsupervised approach over existing data, meaning it was intended to be applied over an entire dataset, each time new data was observed.

However, in a separate section of readme.md, you will find instructions on how to use an alternative version of LoOP that was developed for this use case, when "new" data is observed and scores are desired for those observations. It's intended to be used with "streaming" data, but I think that approach (and this package / API) could work well in that case.

Just read the section of the readme.md about the streaming data and you should be good for your use case. Otherwise, I suggest an alternative outlier detection method.