fabsig / GPBoost

Hi, I am trying to save my model out so that it can be loaded elsewhere for predictions.

My model is trained and saved with the following lines of code

# Train code
gpm = gpb.train(params=params, train_set=data_train, gp_model=gp_model, num_boost_round=num_boosting_round) 

# Save code
gpm.save_model(model_path)

I loaded the model again in the same script to ensure that the test data is exactly the same using the following lines:

loaded_gpm  = gpb.Booster(params, model_file=model_save_path)
loaded_gpm.predict(data=X_test_processed,  group_data_pred=X_test_group, gp_coords_pred=None, predict_var=True, pred_latent=False)['response_mean']

the prediction outputs from gpm.predict is different from loaded_gpm.predict is different.

I noted that this is very similar to

GPBoost/examples/python-guide/GPBoost_algorithm.py

Lines 257 to 274 in 571dc24

    
           gp_model = gpb.GPModel(group_data=group, likelihood=likelihood) 
        
           data_train = gpb.Dataset(X, y) 
        
           bst = gpb.train(params=params, train_set=data_train, 
        
                           gp_model=gp_model, num_boost_round=num_boost_round) 
        
           group_test = np.array([1,2,-1]) 
        
           Xtest = np.random.rand(len(group_test), p) 
        
           pred = bst.predict(data=Xtest, group_data_pred=group_test,  
        
                              predict_var=True, pred_latent=True) 
        
           # Save model 
        
           bst.save_model('model.json') 
        
           # Load from file and make predictions again 
        
           bst_loaded = gpb.Booster(model_file = 'model.json') 
        
           pred_loaded = bst_loaded.predict(data=Xtest, group_data_pred=group_test,  
        
                                            predict_var=True, pred_latent=True) 
        
           # Check equality 
        
           print(pred['fixed_effect'] - pred_loaded['fixed_effect']) 
        
           print(pred['random_effect_mean'] - pred_loaded['random_effect_mean']) 
        
           print(pred['random_effect_cov'] - pred_loaded['random_effect_cov'])

May I get some guidance as to why it is giving me different results in my case? Thank you!

Thanks a lot for using GPBoost and for reporting this issue!

Can you provide a minimal working example (including data, maybe simulated) so that I can reproduce this issue? Which version of GPBoost are you using?

Hi! Thank you for getting back to me so quickly.

I created a gist with a similar data simulation method from the example and how I run the model here https://gist.github.com/xsher/f8710fdfb0c99c4c09fe0de2109ab529.
I still observe discrepancy in the prediction outputs.

I am using GPBoost version 0.8.1

The reason for this bug is that saving and loading from file did not work correctly when doing Nesterov-accelerated boosting. I have fixed this now (with GPBoost version 1.0.1).

FWIW: Nesterov acceleration can be used in GPBoost for covariance parameter estimation as well as for boosting itself. You are currently applying Nesterov-accelerated boosting since you set 'use_nesterov_acc': True for the params of a Booster object in gpb.train(). See lines 4-9 of "Algorithm 1: GPBoost" in Sigrist (2022, JMLR) for more information on Nesterov-accelerated boosting.

Thanks again for reporting this bug!

Cv v__

	gp_model = gpb.GPModel(group_data=group, likelihood=likelihood)
	data_train = gpb.Dataset(X, y)
	bst = gpb.train(params=params, train_set=data_train,
	gp_model=gp_model, num_boost_round=num_boost_round)
	group_test = np.array([1,2,-1])
	Xtest = np.random.rand(len(group_test), p)
	pred = bst.predict(data=Xtest, group_data_pred=group_test,
	predict_var=True, pred_latent=True)
	# Save model
	bst.save_model('model.json')
	# Load from file and make predictions again
	bst_loaded = gpb.Booster(model_file = 'model.json')
	pred_loaded = bst_loaded.predict(data=Xtest, group_data_pred=group_test,
	predict_var=True, pred_latent=True)
	# Check equality
	print(pred['fixed_effect'] - pred_loaded['fixed_effect'])
	print(pred['random_effect_mean'] - pred_loaded['random_effect_mean'])
	print(pred['random_effect_cov'] - pred_loaded['random_effect_cov'])

Predictions from saved model does not give the same results as the original