Issue with T distribution

Question

Issue with T distribution

eluerken-wm opened this issue 3 years ago · comments

I was trying to adjust the underlying distribution to follow Students t-Distribution, but it is throwing an error. For example, when I tried to adjust the demo example to:

from ngboost import NGBRegressor
from ngboost.distns import T

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, Y = load_boston(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

ngb = NGBRegressor(Dist = T).fit(X_train, Y_train)
Y_preds = ngb.predict(X_test)
Y_dists = ngb.pred_dist(X_test)

# test Mean Squared Error
test_MSE = mean_squared_error(Y_preds, Y_test)
print('Test MSE', test_MSE)

# test Negative Log Likelihood
test_NLL = -Y_dists.logpdf(Y_test).mean()
print('Test NLL', test_NLL)

I get the following errors repeated at every iteration (this is a small sample; it printed 500 times):

/usr/local/lib/python3.8/site-packages/ngboost/distns/t.py:64: RuntimeWarning: overflow encountered in exp
  self.df = np.exp(params[2])
/usr/local/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py:6321: RuntimeWarning: invalid value encountered in subtract
  lPx = (sc.gammaln((r+1)/2) - sc.gammaln(r/2)
/usr/local/lib/python3.8/site-packages/scipy/stats/_continuous_distns.py:6322: RuntimeWarning: invalid value encountered in multiply
  - (0.5*np.log(r*np.pi) + (r+1)/2*np.log(1+(x**2)/r)))

Alejandro Schuler · Answer 1 · Fri Aug 20 2021 09:25:50 GMT+0800 (China Standard Time)

This is an issue that crops up from time to time with the continuous distributions... we haven't quite been able to solve it in all cases but in general we've found that normalizing the target Y can help.