sasasagagaga / BinsEncoder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Binning

Data binning, also called discrete binning or bucketing, is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a bin, are replaced by a value representative of that interval, often the central value. It is a form of quantization.

Two classes are implemented here for data binning: BinsEncoder and BinsDiscretizer.

  • BinsDiscretizer is used to split all values of real feature vector into some bins.
  • BinsEncoder is used to encode each bin with some value.

Example

Example of BinsEncoder usage can be found here.

TODO list

  • Check and add asserts
  • Make encode_bins parameter more flexible may be?
  • Process empty bins borders correctly
  • Process NaNs correctly

About


Languages

Language:Jupyter Notebook 96.0%Language:Python 4.0%