CharMax

Data Mining using Characteristic Sets and Maximal Consistent Blocks for Incomplete Data Sets.

Abstract

The process of going through data to discover hidden connections and predict future trends has a long history. In this data driven world, data mining is an important process to extract knowledge or insights from data in various forms. It explores the unknown credible patterns which are significant in solving many problems. There are quite a few techniques in data mining including classification, clustering and prediction. We will discuss about classification, by using a technique called rule induction using two different methods.

We compare the complexity of rule sets induced using characteristic sets and maximal consistent blocks. The complexity of rule sets is determined by the total number of rules induced for a given data set and the total number of conditions present in each rule. We used Incomplete Data sets to induce rules. These data sets have missing attribute values. Both methods were implemented and analyzed to check how it influences the complexity. Preliminary results suggest that the choice between characteristic sets and generalized maximal consistent blocks is inconsequential. But the cardinality of the rule sets is always smaller for incomplete data sets with “do not care” conditions. Thus, the choice between interpretations of missing attribute value is more important than the choice between characteristic sets and generalized maximal consistent blocks.

References

Clark, P. G., Gao, C., Grzymala-Busse, J. W., Mroczek, T., & Niemiec, R. (2018). Complexity of Rule Sets in Mining Incomplete Data Using Characteristic Sets and Generalized Maximal Consistent Blocks.
Clark, P. G., Gao, C., Grzymala-Busse, J. W., & Mroczek, T. (2018). Characteristic sets and generalized maximal consistent blocks in mining incomplete data. Information Sciences.

About

Complexity of Rule Sets in Mining Incomplete Data using Characteristic Sets and Generalized Maximum Consistent Blocks

datamining mlem2 lem2 incomplete-data

Languages

Language:Jupyter Notebook 51.7%Language:D 18.5%Language:Common Lisp 13.4%Language:Makefile 10.5%Language:C# 5.9%