Error - LinAlgError: Singular matrix

Question

Error - LinAlgError: Singular matrix

joeanton719 opened this issue 3 years ago · comments

Hi, basically, I am using a for loop to compare the accuracy of 30 classification models on the train set so as to select the best model. All the classification models run fine without any error. But, when I added ngboost to the list of models, I get the following error.

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-13-08b2e82f43ff> in <module>
      3 
      4 v = classif_models(X_train, y_train, cv = skf)
----> 5 v.check_clf_models()

<ipython-input-8-339118becac6> in check_clf_models(self)
     97                 y_train_kfold, y_val_kfold = self.ytrain[train_index], self.ytrain[test_index]
     98                 classifier = model
---> 99                 classifier.fit(X_train_kfold, y_train_kfold)
    100                 y_pred = classifier.predict(X_val_kfold)
    101 

~\anaconda3\lib\site-packages\ngboost\ngboost.py in fit(self, X, Y, X_val, Y_val, sample_weight, val_sample_weight, train_loss_monitor, val_loss_monitor, early_stopping_rounds)
    267             loss_list += [train_loss_monitor(D, Y_batch, weight_batch)]
    268             loss = loss_list[-1]
--> 269             grads = D.grad(Y_batch, natural=self.natural_gradient)
    270 
    271             proj_grad = self.fit_base(X_batch, grads, weight_batch)

~\anaconda3\lib\site-packages\ngboost\scores.py in grad(self, Y, natural)
     10         if natural:
     11             metric = self.metric()
---> 12             grad = np.linalg.solve(metric, grad)
     13         return grad
     14 

<__array_function__ internals> in solve(*args, **kwargs)

~\anaconda3\lib\site-packages\numpy\linalg\linalg.py in solve(a, b)
    392     signature = 'DD->D' if isComplexType(t) else 'dd->d'
    393     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 394     r = gufunc(a, b, signature=signature, extobj=extobj)
    395 
    396     return wrap(r.astype(result_t, copy=False))

~\anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
     86 
     87 def _raise_linalgerror_singular(err, flag):
---> 88     raise LinAlgError("Singular matrix")
     89 
     90 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

Whats causing this error?

Here is the code snippet I am using (Inspired by Pycaret):

tree_models = ["list of all classification algorithms"]

for model in tree_models:
            
            score = [] 
       
            for train_index, test_index in cv.split(Xtrain,ytrain): 
                X_train_kfold, X_val_kfold = Xtrain[train_index], Xtrain[test_index] 
                y_train_kfold, y_val_kfold = ytrain[train_index], ytrain[test_index] 
                classifier = model 
                classifier.fit(X_train_kfold, y_train_kfold) 
                y_pred = classifier.predict(X_val_kfold)
                
                score.append(metrics.accuracy_score(y_val_kfold, y_pred))

Looking forward to your help!

Alejandro Schuler · Answer 1 · Wed May 26 2021 23:41:41 GMT+0800 (China Standard Time)

Hey @joeanton719, it looks like you're hitting some kind of ill-conditioned fisher information matrix in the computation of the natural gradient. It's not really possible to say why this might happen without some experimentation with the precise data you're using.

I do want to point out though that you're using NGBoost for classification here- in the classification setting, NGBoost really doesn't have any theoretical advantage over using any other boosting algorithm because classification is almost always treated probabilistically. It's not that NGBoost is bad for classification, it's just that other libraries are more robust while giving you the same thing anyways. For more detail, see, for instance: https://towardsdatascience.com/interpreting-the-probabilistic-predictions-from-ngboost-868d6f3770b2

Joe · Answer 2 · Thu May 27 2021 15:58:30 GMT+0800 (China Standard Time)

Ok, thank you for that information. Perhaps I will just use ngboost for regression problems. Thank you for your reply!