Question - Fit and predict() only using calibration variables?

Question

Question - Fit and predict() only using calibration variables?

SSMK-wq opened this issue 2 years ago · comments

I currently have a code written like below

bgf = BetaGeoFitter(penalizer_coef=0.001) # model object
bgf.fit(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal']) # model fitting

# Prediction of expected number of transaction for each customer for one year (365 days)
summary_cal_holdout['expctd_num_of_purch'] = bgf.predict(365, summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal']) 
summary_cal_holdout.sort_values("expctd_num_of_purch",ascending=False).head()

As you can see that I fit the model using frequency_cal, recency_cal and T_cal (from calibration dataset).

So, now in the next line (that is bgf.predict), I again use the same frequency_cal, recency_cal and T_cal.

So, is this the right thing to do? Meaning, we pass the same variables as input to both fit and predict methods? I am so used to fitting using X_train and predicting using X_test in usual ML models. So, I was bit confused.

Since, our objective is to predict the frequency_holdout, we have no other option than to fit and predict using the same calibration variables? So, through fit(), we learn the model parameters (through probability distributions) and with predict() with a time horizon, model uses the same input variables along with time horizon value and uses the parameters learnt to come up with prediction output?

Sorry, if my question seems redundant. Am bit confused. but, all your responses are helping towards getting some clarity. Appreciate your help.