Process original datasets to create static train/test splits
johnantonn opened this issue · comments
Datasets, subsampled to 5000 points and stratified split to train/test with train_size = 30%:
- Training set: 3500 or less
- Test set: 1500 or less
Details about the training sets (from which the validation sets will be generated):
-
ALOI
Total: 3500
Normal: 3383
Outliers: 117 -
Annthyroid
Total: 3500
Normal: 3232
Outliers: 268 -
Waveform
Total: 2410
Normal: 2340
Outliers: 70 -
Cardiotocography
Total: 1479
Normal: 1153
Outliers: 326 -
PageBlocks
Total: 3500
Normal: 3171
Outliers: 329 -
SpamBase
Total: 2944
Normal: 1769
Outliers: 1175
Remove PageBlocks dataset from the experiments since it's not useful (based on results it's showing), and include the KDDCUP99 dataset by subsampling only the normal class.