Implement ChiMerge [1992] via python
ChiMerge is announced by Randy Kerber in 1992. It is an algorithm that use X2 statistics to discretize numeric attribution, which is a good method to reduce the afford of calculate before data mining.
ChiMerger consists 4 basic step:
- Sort the data in ascending order.
- Define initial intervals so that every value is in a separate interval.
- Calculate the X2 of any two adjacent intervals .
- Find the smallest X2, and merge the intervals who own the X2
- Repeat 3 & 4 steps until all the X2 is larger than threshold value.
The project use iris data as example, you can get the data from https://archive.ics.uci.edu/ml/machine-learning-databases/iris/