Can't reproduce default MSE loss function
marcohkm opened this issue · comments
Hello
I aim to train two XGBoost models:
One using the built-in reg:squarederror (MSE) loss function.
Another using a custom loss function designed to mimic MSE.
Despite my custom loss function being theoretically identical to MSE, the predictions from the two models differ. Here's the code illustrating the issue:
Fixing seeds for reproducibility
np.random.seed(42)
random.seed(42)
def custom_loss_function(preds, dtrain):
labels = dtrain.get_label()
errors = preds - labels
grad = 2 * errors
hess = np.ones_like(grad) * 2
return grad, hess
def xgboost_model_with_custom_loss(params, n, num_boost_round):
features = select_features.sort_values(by='Rank_XGBoost', ascending=True)['Feature'].values.tolist()[:n]
X_train, X_test = df_train[features], df_test[features]
y_train, y_test = df_train['Y'], df_test['Y']
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params['seed'] = 42
# Training with custom loss function
model1 = xgb.train(params, dtrain, num_boost_round=num_boost_round, obj=custom_loss_function)
y_pred1 = model1.predict(dtest)
# Training with default MSE loss function
params['objective'] = 'reg:squarederror'
model2 = xgb.train(params, dtrain, num_boost_round=num_boost_round)
y_pred2 = model2.predict(dtest)
# Comparing predictions
difference = y_pred2 - y_pred1
print(f"Difference: {difference[:10]}")
return y_pred2, y_pred1
Example usage
params = {
'max_depth': 4,
'eta': 0.01,
'min_child_weight': 9,
'subsample': 0.5,
'alpha': 100,
'colsample_bytree': 0.3,
}
n = 10
num_boost_round = 100
y_pred2, y_pred1 = xgboost_model_with_custom_loss(params, n, num_boost_round)
(array([-0.00540604, 0.00381444, -0.0029127 , ..., -0.00581155,
0.00467715, 0.0053932 ], dtype=float32),
array([1.3519797, 1.3480775, 1.3533579, ..., 1.3520167, 1.3488159,
1.3469104], dtype=float32))
How can i fix this ?
Thanks in advance