Error - LinAlgError: Singular matrix
joeanton719 opened this issue · comments
Hi, basically, I am using a for loop to compare the accuracy of 30 classification models on the train set so as to select the best model. All the classification models run fine without any error. But, when I added ngboost to the list of models, I get the following error.
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-13-08b2e82f43ff> in <module>
3
4 v = classif_models(X_train, y_train, cv = skf)
----> 5 v.check_clf_models()
<ipython-input-8-339118becac6> in check_clf_models(self)
97 y_train_kfold, y_val_kfold = self.ytrain[train_index], self.ytrain[test_index]
98 classifier = model
---> 99 classifier.fit(X_train_kfold, y_train_kfold)
100 y_pred = classifier.predict(X_val_kfold)
101
~\anaconda3\lib\site-packages\ngboost\ngboost.py in fit(self, X, Y, X_val, Y_val, sample_weight, val_sample_weight, train_loss_monitor, val_loss_monitor, early_stopping_rounds)
267 loss_list += [train_loss_monitor(D, Y_batch, weight_batch)]
268 loss = loss_list[-1]
--> 269 grads = D.grad(Y_batch, natural=self.natural_gradient)
270
271 proj_grad = self.fit_base(X_batch, grads, weight_batch)
~\anaconda3\lib\site-packages\ngboost\scores.py in grad(self, Y, natural)
10 if natural:
11 metric = self.metric()
---> 12 grad = np.linalg.solve(metric, grad)
13 return grad
14
<__array_function__ internals> in solve(*args, **kwargs)
~\anaconda3\lib\site-packages\numpy\linalg\linalg.py in solve(a, b)
392 signature = 'DD->D' if isComplexType(t) else 'dd->d'
393 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 394 r = gufunc(a, b, signature=signature, extobj=extobj)
395
396 return wrap(r.astype(result_t, copy=False))
~\anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
86
87 def _raise_linalgerror_singular(err, flag):
---> 88 raise LinAlgError("Singular matrix")
89
90 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix
Whats causing this error?
Here is the code snippet I am using (Inspired by Pycaret):
tree_models = ["list of all classification algorithms"]
for model in tree_models:
score = []
for train_index, test_index in cv.split(Xtrain,ytrain):
X_train_kfold, X_val_kfold = Xtrain[train_index], Xtrain[test_index]
y_train_kfold, y_val_kfold = ytrain[train_index], ytrain[test_index]
classifier = model
classifier.fit(X_train_kfold, y_train_kfold)
y_pred = classifier.predict(X_val_kfold)
score.append(metrics.accuracy_score(y_val_kfold, y_pred))
Looking forward to your help!
Hey @joeanton719, it looks like you're hitting some kind of ill-conditioned fisher information matrix in the computation of the natural gradient. It's not really possible to say why this might happen without some experimentation with the precise data you're using.
I do want to point out though that you're using NGBoost for classification here- in the classification setting, NGBoost really doesn't have any theoretical advantage over using any other boosting algorithm because classification is almost always treated probabilistically. It's not that NGBoost is bad for classification, it's just that other libraries are more robust while giving you the same thing anyways. For more detail, see, for instance: https://towardsdatascience.com/interpreting-the-probabilistic-predictions-from-ngboost-868d6f3770b2
Ok, thank you for that information. Perhaps I will just use ngboost for regression problems. Thank you for your reply!