Description

This repository is used for experiments considering the Regression Neural Gas and Regression-Sensitive Neural Gas [source].

There are also other models implemented: Radial-Basis-Function Network ([RBFN]), Regression Learning Vector Quantization ([RLVQ]), Neural Gas for time-series prediction ([NGTSP]) and a variant of it, in which the predictor is altered to be dependent on the data samples rather than on the distance vector (xNGTSP / RNGTSP)

Experiments

In the file experiments.py the code can be found used to produce the results of a comparison of these models. This section deals with the setting. A comprehensive overview of the results is available in the folder results including a summary.

Datasets

The datasets included are WineQuality-red, California Housing, Breastcancer Prognostics and Diabetes. All Datasets are normalized in range [0, 1]. Furthermore, different targets can be chosen for Winequality and Breastcancer. For Winequality we chose for experiments.py the target alcohol. For Breastcancer we went with the mean perimeter as the target and removed the columns ID and Outcome (for the sake of normalization) and the column Lymph Node Status, due to missing values.

Parameter Setting and Modelling

All models were initialized via $k$-Means, except for the hybrid ones, which used Neural Gas (NGTSP, xNGTSP). The training time is here the epoch number which was in total 100 epochs, except for California Housing, which was 10 epochs, due to the size of the dataset. For the visibility parameter in Neural Gas we used $\lambda_{NG}(t) = 10 \cdot (0.95) ^ t$ for training time $t$ to initialize the Neural Gas prototypes for the models NGTSP and RNGTSP. For the regression setting we chose $\lambda(t) = 0.999 ^ t$ and $\lambda_{reg}(t) = 0.5 \cdot \lambda(t)$ and as for the balancing $\alpha(t) = 0.99^t$. As for the learning rate $\epsilon(t)$ we used an exponential decay with $\epsilon(t) = 0.01^t$. Furthermore, the RBFs are modelled as

$$g_{RBF}(\sigma, x, p_i) = exp\left(- \sigma_i ||x - p_i||^2\right)$$

for the RBFN with prototype/center $p_i$ and deviation $\sigma_i$. And as

$$g_{Reg(Se)NG}(\lambda_{reg}(t), x, p_i) = exp\left(- \frac{||x - p_i||^2}{\lambda_{reg}(t)}\right)$$

for RegNG and RegSeNG. Further for the parameter $\sigma_P$ in RLVQ we decided for $\sigma_P(0) = 5$ and for a similar schedule as in [RLVQ].

Furthermore, a batch-normlization layer was applied to accelerate training and enhance reproducability.

Validation and Measures

We used a 5-fold Cross-Validation for each 5, 10 and 15 prototypes. For validation measures we used the coefficient of determination $r^2$ and the standard error $sep$ (both are provided by [scipy]).

Note that there is an additional measure $err10$ which is used to evaluate the percentage of the predictions having less or equal than $10\%$ deviation (in $L_1$ Norm) to the targets.

In the summary.csv also the maximum achieved values for each measure are recorded.

rmschubert / RegressionVQ

Description

Experiments

Datasets

Parameter Setting and Modelling

Validation and Measures

About

Languages