How to combine early stopping?
ZeroAlcoholic opened this issue · comments
Thanks for your contribution. I was looking for a great api for stacking then found your good package .
I am wondering that is it possible to combine the early_stopping in lightgbm or EarlyStopping in keras with VECSTACK (because I don't know how to do it) ?
EDIT
predict
call inside class definition was modified.
Before: num_iteration=super(WrapLGB, self).best_iteration_
Now: num_iteration=self.best_iteration_
See explanation in the comment below.
Yes, it's possible. This task is accomplished by passing estimator's fit
and predict
arguments through user-defined class wrapper.
You should remember that stacking procedure performs cross-validation inside. So if you initialize StackingTransformer
with 4 folds like so stack = StackigTransformer(n_folds=4)
it means that when you call stack.fit(X_train, y_train)
you actually fit 4 models on 3/4 of X_train
each. Now you want to perform early stopping for each of these 4 models and you need a validation set to compute scores. Remember that you can NOT use out-of-fold part (1/4 of X_train
) for early stopping because in each fold you predict this part and you can NOT touch it to avoid overfitting.
To get validation set you have two options:
- You can use the same fixed validation set for each of 4 folds. You should prepare this set beforehand.
- You can generate new validation set in each fold e.g. 1/5 of current fold's training data. It means that you will actually train on
(4/5) of (3/4) of X_train (i.e. 12/20 of X_train)
and perform early stopping on(1/5) of (3/4) of X_train (i.e. 3/20 of X_train)
. Just a reminder: out-of-fold part (which you can NOT touch) is1/4 of X_train (i.e. 5/20 of X_train)
. See example below.
Option 2. Complete example
# Set up regression problem
import numpy as np
np.random.seed(42)
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error as mse
from sklearn.model_selection import train_test_split
from lightgbm import LGBMRegressor
from vecstack import StackingTransformer
boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data,
boston.target,
test_size=0.2,
random_state=42)
#----------------------------------------------------------
# User-defined class wrapper
class WrapLGB(LGBMRegressor):
"""This is template for user-defined class wrapper.
Use this template to pass any ``fit`` and ``predict`` arguments.
"""
def fit(self, X, y):
X_tr, X_val, y_tr, y_val = train_test_split(X, y,
test_size=0.2,
random_state=42)
return super(WrapLGB, self).fit(X_tr, y_tr,
early_stopping_rounds=5,
eval_set=[(X_val, y_val)],
eval_metric='l2', verbose=1)
def predict(self, X):
return super(WrapLGB, self).predict(X,
num_iteration=self.best_iteration_)
#----------------------------------------------------------
# Initialize StackingTransformer
estimators = [('wraplgb', WrapLGB(learning_rate=0.9,
n_estimators=1000,
random_state=42))]
stack = StackingTransformer(estimators, regression=True,
n_folds=4, metric=mse)
# Fit and transform
stack = stack.fit(X_train, y_train)
S_train = stack.transform(X_train)
S_test = stack.transform(X_test)
Output
I put raw output here for demonstration.
You can see that early stopping was performed in each of 4 folds:
[1] valid_0's l2: 32.0246
Training until validation scores don't improve for 5 rounds.
[2] valid_0's l2: 23.464
[3] valid_0's l2: 22.2144
[4] valid_0's l2: 19.8271
[5] valid_0's l2: 22.7295
[6] valid_0's l2: 21.3527
[7] valid_0's l2: 22.6876
[8] valid_0's l2: 22.4059
[9] valid_0's l2: 21.4023
Early stopping, best iteration is:
[4] valid_0's l2: 19.8271
[1] valid_0's l2: 22.6718
Training until validation scores don't improve for 5 rounds.
[2] valid_0's l2: 22.0576
[3] valid_0's l2: 20.7717
[4] valid_0's l2: 21.4487
[5] valid_0's l2: 20.7593
[6] valid_0's l2: 19.9866
[7] valid_0's l2: 20.8062
[8] valid_0's l2: 20.8037
[9] valid_0's l2: 20.7226
[10] valid_0's l2: 20.7807
[11] valid_0's l2: 22.9261
Early stopping, best iteration is:
[6] valid_0's l2: 19.9866
[1] valid_0's l2: 36.1314
Training until validation scores don't improve for 5 rounds.
[2] valid_0's l2: 24.1133
[3] valid_0's l2: 17.6557
[4] valid_0's l2: 20.1154
[5] valid_0's l2: 20.1621
[6] valid_0's l2: 19.742
[7] valid_0's l2: 18.1264
[8] valid_0's l2: 17.9662
Early stopping, best iteration is:
[3] valid_0's l2: 17.6557
[1] valid_0's l2: 32.7848
Training until validation scores don't improve for 5 rounds.
[2] valid_0's l2: 26.3399
[3] valid_0's l2: 27.7075
[4] valid_0's l2: 25.7245
[5] valid_0's l2: 24.1551
[6] valid_0's l2: 22.0104
[7] valid_0's l2: 19.5018
[8] valid_0's l2: 19.4044
[9] valid_0's l2: 19.7235
[10] valid_0's l2: 19.9468
[11] valid_0's l2: 19.242
[12] valid_0's l2: 18.8428
[13] valid_0's l2: 19.4026
[14] valid_0's l2: 19.7783
[15] valid_0's l2: 20.3338
[16] valid_0's l2: 20.4569
[17] valid_0's l2: 20.5523
Early stopping, best iteration is:
[12] valid_0's l2: 18.8428
Please pay attention.
I made a little but important modification in the predict
call inside class definition in previous comment.
Before: num_iteration=super(WrapLGB, self).best_iteration_
Now: num_iteration=self.best_iteration_
In this case both variants work identically because best_iteration_
is a property.
But we should remember that super(WrapLGB, self).best_iteration_
works only if best_iteration_
is a property whereas self.best_iteration_
works always (doesn’t matter whether best_iteration_
is a property or just data attribute (class field)).
Thank you. I am thinking about the 'can NOT touch' part.... I will try it. Thanks.