vecxoz / vecstack

Python package for stacking (machine learning technique)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

metric=auc

qiaohaoforever opened this issue · comments

It's great job!
can metric=auc,when I want to use Classifier?

Thanks!

You can define any metric you want in the form: def my_metric(y_true, y_pred):.
If your metric needs class labels in y_pred you call stacking function with needs_proba=False.
If your metric needs probabilities in y_pred you call stacking function with needs_proba=True.

Below I'll show how to define ROC AUC metric which works for both binary and multiclass classification. The easiest way is to use roc_auc_score from scikit-learn package. But to make it work we need to transform true class labels into one-hot encoding.

Please look at the complete example:

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import OneHotEncoder
from vecstack import stacking

# Define ROC AUC metric
def auc(y_true, y_pred):
    """ROC AUC metric for both binary and multiclass classification.
    
    Parameters
    ----------
    y_true : 1d numpy array
        True class labels
    y_pred : 2d numpy array
        Predicted probabilities for each class
    """
    ohe = OneHotEncoder(sparse=False)
    y_true = ohe.fit_transform(y_true.reshape(-1, 1))
    auc_score = roc_auc_score(y_true, y_pred)
    return auc_score

# Create data: 500 example, 5 feature, 3 classes
X, y = make_classification(n_samples=500, n_features=5, 
                           n_informative=3, n_redundant=1, 
                           n_classes=3, flip_y=0, 
                           random_state=0)

# Make train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=0)

# Init 1st level models
models = [
    RandomForestClassifier(random_state=0, n_jobs=-1, 
                           n_estimators=100, max_depth=3),
]

# Perform stacking
S_train, S_test = stacking(models,
                           X_train, y_train, X_test,
                           regression=False, # classification task
                           needs_proba=True, # predict probabilities
                           metric=auc,       # metric
                           verbose=2)