AutoViML / featurewiz

Use advanced feature engineering strategies and select best features from your data set with a single line of code. Created by Ram Seshadri. Collaborators welcome.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

featurewiz ignores category columns with an example

reza1615 opened this issue · comments

commented

My dataframe has 12408 columns which 536 of them are binary features.
Here is my process:
1- Set binary columns dtype as categoy
image
2- After running fw.featurewiz it droped 536 columns before SULV (12404 - 1 (target) - 536 (dtype category)= > 11867)
image
For decision-tree Algorithms such as xgboost having category dytpe usually brings better performance and speed and spliting. please read

Hi @reza1615 : Can you upgrade your featurewiz library to the latest version and try? I believe seeing your output that you are using a older version. I will update the library if you see any problems with the latest version. thanks

Hi @reza1615
After looking at your problem, some more, I realized that SULOV is meant only for finding correlation among numerical vars 👍

Hence, I skip sulov processing for categorical vars. But I absolutely include them during XGBoost feature selection. See my code snippet from featurewiz/FE_perform_recursive_xgboost below:
dtrain = xgb.DMatrix(X_train, y_train, enable_categorical=True, feature_names=cols_sel)

Can you please double-check this is working for you?
Thanks
Auto Vimal

commented

Thank you for your time. I will update featurewiz base on the repo and also skip the SULOV to see what will happen

commented

@AutoViML: The new version doesnt remove these category columns. but its process seems diffrent from the old version
1- Because of Ram my PC crashed (I have 64 G Ram + 270 G Swap) but old one doesnt have this problem.
2-The new version selected 3600 features but the old version selected 2700 and the old one's selection F1 is higher.
I can open a ticket for that.

@reza1615 👍 I suggest you stick to the old version if it works better. I have made so many changes to the new version, it will be difficult to find out what was different. You can post the version numbers of the old and new here so I can take a look when I have some time.
Auto Vimal