solegalli / feature-selection-for-machine-learning

Code repository for the online course Feature Selection for Machine Learning

Home Page:https://www.courses.trainindata.com/p/feature-selection-for-machine-learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to created new features by all features combinatoric combination

Sandy4321 opened this issue · comments

Is your feature request related to a problem? Please describe.
if we have categorical features how to created new features by all features combinatoric combination
since in real life categorical features are NOT independent , but many of them are dependent from each to others

even scikit learn can not do, but you will?

related to
PacktPublishing/Python-Feature-Engineering-Cookbook#1
Describe the solution you'd like
for example maximum number of combined features is given: or 2 or 4 or 5

for pandas DF you can use
concatenation
https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-dataframe-in-pandas-python

columns = ['whatever', 'columns', 'you', 'choose']
df['period'] = df[columns].astype(str).sum(axis=1)

so three features combinations from 11 features
features combinatoric combination
seems to be 3 nested loops are not good for this
for i in range(1,11)
for j in range(i+1,11)
for k in range(j+1,11)

you need to get 165 new features from all combinations (not permutations )
then you get many new features

"
Another alternative that I've seen from some Kaggle masters is to join the categories in 2 different variables, into a new categorical variable, so for example, if you have the variable gender, with the values female and male, for observations 1 and 2, and the variable colour with the value blue and green for observations 1 and 2 respectively, you could create a 3rd categorical variable called gender-colour, with the values female-blue for observation 1 and male-green for observation 2. Then you would have to apply the encoding methods from section 3 to this new variable
."

Hi @Sandy4321 thank you for participating in the project. This repo belongs to a course on feature selection. So I think this issue is best placed in my other repo, Feature-engine. I believe you open an issue there already. Thank you.