pplonski / my_ml_service

My Machine Learning Web Service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error "y contains previously unseen labels: 'Private'" (Training may be unable to encode all categoricals)

dezoito opened this issue · comments

When first testing the RandomForestClassifier class I got an error:

python manage.py test apps.ml.tests

{'status': 'Error', 'message': "y contains previously unseen labels: 'Private'"}

I believe that due to the 30% split in test/train data, there was no person with the workclass "Private", and thus that value was never encoded to a number in the training dataset artifact.

Rerunning the training and artifact generation in the jupyter notebook seemed to fix it for me.

(Posting this just in case someone gets stuck due to this error, as I have no suggestions on how to stop this from happening in the first place)

commented

Yes, the problem is with an unseen category.

commented

In the AutoML package that I'm working on I have a try ... except block for such situations. You can check details here: https://github.com/mljar/mljar-supervised/blob/master/supervised/preprocessing/label_encoder.py#L14

Will do.
Thank you for the awesome work.