stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error - LinAlgError: Singular matrix

joeanton719 opened this issue · comments

commented

Hi, basically, I am using a for loop to compare the accuracy of 30 classification models on the train set so as to select the best model. All the classification models run fine without any error. But, when I added ngboost to the list of models, I get the following error.

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-13-08b2e82f43ff> in <module>
      3 
      4 v = classif_models(X_train, y_train, cv = skf)
----> 5 v.check_clf_models()

<ipython-input-8-339118becac6> in check_clf_models(self)
     97                 y_train_kfold, y_val_kfold = self.ytrain[train_index], self.ytrain[test_index]
     98                 classifier = model
---> 99                 classifier.fit(X_train_kfold, y_train_kfold)
    100                 y_pred = classifier.predict(X_val_kfold)
    101 

~\anaconda3\lib\site-packages\ngboost\ngboost.py in fit(self, X, Y, X_val, Y_val, sample_weight, val_sample_weight, train_loss_monitor, val_loss_monitor, early_stopping_rounds)
    267             loss_list += [train_loss_monitor(D, Y_batch, weight_batch)]
    268             loss = loss_list[-1]
--> 269             grads = D.grad(Y_batch, natural=self.natural_gradient)
    270 
    271             proj_grad = self.fit_base(X_batch, grads, weight_batch)

~\anaconda3\lib\site-packages\ngboost\scores.py in grad(self, Y, natural)
     10         if natural:
     11             metric = self.metric()
---> 12             grad = np.linalg.solve(metric, grad)
     13         return grad
     14 

<__array_function__ internals> in solve(*args, **kwargs)

~\anaconda3\lib\site-packages\numpy\linalg\linalg.py in solve(a, b)
    392     signature = 'DD->D' if isComplexType(t) else 'dd->d'
    393     extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 394     r = gufunc(a, b, signature=signature, extobj=extobj)
    395 
    396     return wrap(r.astype(result_t, copy=False))

~\anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
     86 
     87 def _raise_linalgerror_singular(err, flag):
---> 88     raise LinAlgError("Singular matrix")
     89 
     90 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix

Whats causing this error?

Here is the code snippet I am using (Inspired by Pycaret):

tree_models = ["list of all classification algorithms"]

for model in tree_models:
            
            score = [] 
       
            for train_index, test_index in cv.split(Xtrain,ytrain): 
                X_train_kfold, X_val_kfold = Xtrain[train_index], Xtrain[test_index] 
                y_train_kfold, y_val_kfold = ytrain[train_index], ytrain[test_index] 
                classifier = model 
                classifier.fit(X_train_kfold, y_train_kfold) 
                y_pred = classifier.predict(X_val_kfold)
                
                score.append(metrics.accuracy_score(y_val_kfold, y_pred))

Looking forward to your help!

Hey @joeanton719, it looks like you're hitting some kind of ill-conditioned fisher information matrix in the computation of the natural gradient. It's not really possible to say why this might happen without some experimentation with the precise data you're using.

I do want to point out though that you're using NGBoost for classification here- in the classification setting, NGBoost really doesn't have any theoretical advantage over using any other boosting algorithm because classification is almost always treated probabilistically. It's not that NGBoost is bad for classification, it's just that other libraries are more robust while giving you the same thing anyways. For more detail, see, for instance: https://towardsdatascience.com/interpreting-the-probabilistic-predictions-from-ngboost-868d6f3770b2

commented

Ok, thank you for that information. Perhaps I will just use ngboost for regression problems. Thank you for your reply!