dcme
Overview
The dcme
package provides functions to compute data complexity measures.
Installation
dcme
is under development and not yet available on CRAN. You can install the
development version using the devtools
package as follows:
# install.packages("devtools")
devtools::install_github("RomeroBarata/dcme")
Data Complexity Measures
The following complexity measures are currently implemented:
Simple Measures
num_examples
: Number of Observationsnum_examples_majority
: Number of Observations in the Majority Classnum_examples_minority
: Number of Observations in the Minority Classnum_features
: Number of Featuresnum_features_numeric
: Number of Numeric Featuresnum_features_binary
: Number of Binary Featuresnum_features_categorical
: Number of Categorical Featuresnum_classes
: Number of Classesproportion_examples_majority
: Proportion of Majority Examplesproportion_examples_minority
: Proportion of Minority Examplesproportion_features_numeric
: Proportion of Numeric Featuresproportion_features_binary
: Proportion of Binary Featuresproportion_features_categorical
: Proportion of Categorical FeaturesIR
: Imbalance Ratio
num_examples_majority
, num_examples_minority
, proportion_examples_majority
, proportion_examples_minority
, and IR
are defined only for binary data sets.
Statistical Measures
sd_ratio
: Geometric Mean Ratio of Standard Deviationscorr_abs
: Mean Absolute Correlation Coefficient
Measures of Overlap of Individual Feature Values
F1
: Fisher's Discriminant RatioF2
: Volume of Overlap Region
Unfortunately the F1
and F2
measures are implemented only for binary data
sets. General versions will be made available soon.
Measures of Separability of Classes
N2
: Ratio of Average Intra/Inter Class 1-NN DistanceN3
: Error Rate of 1-NN Classifier
Measures of Geometry, Topology, and Density of Manifolds
N4
: Nonlinearity of the 1-NN ClassifierT2
: Average Number of Points per Dimension
References
Definitions and explanations of most functions implemented in the dcme
package can be found in the following literature:
[1] Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification.
[2] Ho, T. K., & Basu, M. (2002). Complexity measures of supervised classification problems. IEEE transactions on pattern analysis and machine intelligence, 24(3), 289-300.