taekyunk / factor_lump

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About

Custom transformer to work with categorical variables

  • FactorLumpProp(prop = 0.05): Similar to fct_lump_prop() in R
  • FactorLumpN(top_n=5): Similar to fct_lump_n() in R

Custom transformer DropHighlyCorrelated(threshold, candidate)

Other utility functions

  • read_cp(): read object using cloudpickle
  • write_cp(): write object using cloudpickle
  • find_lift(): find lift and returns a dataframe
  • find_prop(): find the frequency and probability for a pd.Series

Class FeatureImportance() adapted from

Note

  • Soon, OneHotEncoder() will gain options to collapse infrequent factor levels
  • This code is a temporary solution when the new sklearn is not available

Author

About


Languages

Language:Python 100.0%