yzhao062 / pyod

A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

Home Page:http://pyod.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How will the effectiveness of the model be evaluated, and does the library provide the appropriate methodology?

yuchiu503 opened this issue · comments

Hello, I am now learning data mining and using ABOD for data anomaly detection. After I build the model, I do not know whether the performance and accuracy of the model are good, so I need to evaluate the whole model. Could you help me?

In addition, I see that in your case (abod_example.py), ROC score and Precision@rank n score are used, but there is no y value (label) in the data set, so these two evaluations cannot be used

from pyod.models.abod import ABOD
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np


df = pd.read_csv(
    r"D:\WorkSpace\apple_quality.csv"
)
df.dropna(axis=0, inplace=True)

df_num = df.select_dtypes(include=np.number)

X_train, X_test = train_test_split(df_num, test_size=0.2, random_state=42)

model = ABOD().fit(X_train)

decision_scores = model.decision_scores_
test_scores = model.decision_function(X_test)

HI

if your dataset doesn't provide any labels associated with data, you cannot use the classic metrics provided by a confusion matrix. I suggest to use Silhouette Scorer (for example) provided by sklearn library or any other metrics described here

https://towardsdatascience.com/how-to-evaluate-clustering-performance-without-ground-truth-labels-9c9792ec1c54