How will the effectiveness of the model be evaluated, and does the library provide the appropriate methodology?

Question

How will the effectiveness of the model be evaluated, and does the library provide the appropriate methodology?

yuchiu503 opened this issue 3 months ago · comments

Hello, I am now learning data mining and using ABOD for data anomaly detection. After I build the model, I do not know whether the performance and accuracy of the model are good, so I need to evaluate the whole model. Could you help me?

In addition, I see that in your case (abod_example.py), ROC score and Precision@rank n score are used, but there is no y value (label) in the data set, so these two evaluations cannot be used

from pyod.models.abod import ABOD
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np


df = pd.read_csv(
    r"D:\WorkSpace\apple_quality.csv"
)
df.dropna(axis=0, inplace=True)

df_num = df.select_dtypes(include=np.number)

X_train, X_test = train_test_split(df_num, test_size=0.2, random_state=42)

model = ABOD().fit(X_train)

decision_scores = model.decision_scores_
test_scores = model.decision_function(X_test)

Salvatore Auria · Answer 1 · Sat Apr 06 2024 23:28:02 GMT+0800 (China Standard Time)

HI

if your dataset doesn't provide any labels associated with data, you cannot use the classic metrics provided by a confusion matrix. I suggest to use Silhouette Scorer (for example) provided by sklearn library or any other metrics described here

https://towardsdatascience.com/how-to-evaluate-clustering-performance-without-ground-truth-labels-9c9792ec1c54