fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predictions from saved model does not give the same results as the original

xsher opened this issue · comments

Hi, I am trying to save my model out so that it can be loaded elsewhere for predictions.

My model is trained and saved with the following lines of code

# Train code
gpm = gpb.train(params=params, train_set=data_train, gp_model=gp_model, num_boost_round=num_boosting_round) 

# Save code
gpm.save_model(model_path) 

I loaded the model again in the same script to ensure that the test data is exactly the same using the following lines:

loaded_gpm  = gpb.Booster(params, model_file=model_save_path)
loaded_gpm.predict(data=X_test_processed,  group_data_pred=X_test_group, gp_coords_pred=None, predict_var=True, pred_latent=False)['response_mean']   

the prediction outputs from gpm.predict is different from loaded_gpm.predict is different.

I noted that this is very similar to

gp_model = gpb.GPModel(group_data=group, likelihood=likelihood)
data_train = gpb.Dataset(X, y)
bst = gpb.train(params=params, train_set=data_train,
gp_model=gp_model, num_boost_round=num_boost_round)
group_test = np.array([1,2,-1])
Xtest = np.random.rand(len(group_test), p)
pred = bst.predict(data=Xtest, group_data_pred=group_test,
predict_var=True, pred_latent=True)
# Save model
bst.save_model('model.json')
# Load from file and make predictions again
bst_loaded = gpb.Booster(model_file = 'model.json')
pred_loaded = bst_loaded.predict(data=Xtest, group_data_pred=group_test,
predict_var=True, pred_latent=True)
# Check equality
print(pred['fixed_effect'] - pred_loaded['fixed_effect'])
print(pred['random_effect_mean'] - pred_loaded['random_effect_mean'])
print(pred['random_effect_cov'] - pred_loaded['random_effect_cov'])

May I get some guidance as to why it is giving me different results in my case? Thank you!

Thanks a lot for using GPBoost and for reporting this issue!

Can you provide a minimal working example (including data, maybe simulated) so that I can reproduce this issue? Which version of GPBoost are you using?

Hi! Thank you for getting back to me so quickly.

I created a gist with a similar data simulation method from the example and how I run the model here https://gist.github.com/xsher/f8710fdfb0c99c4c09fe0de2109ab529.
I still observe discrepancy in the prediction outputs.

I am using GPBoost version 0.8.1

The reason for this bug is that saving and loading from file did not work correctly when doing Nesterov-accelerated boosting. I have fixed this now (with GPBoost version 1.0.1).

FWIW: Nesterov acceleration can be used in GPBoost for covariance parameter estimation as well as for boosting itself. You are currently applying Nesterov-accelerated boosting since you set 'use_nesterov_acc': True for the params of a Booster object in gpb.train(). See lines 4-9 of "Algorithm 1: GPBoost" in Sigrist (2022, JMLR) for more information on Nesterov-accelerated boosting.

Thanks again for reporting this bug!