deep-neural-networks hyperparameter-importance hyperparameter-optimization hyperparameter-search hyperparameter-tuning

mlp_hpp_analysis

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Before using

Install requirements.txt by using the command pip install -r requirements.txt

To use this module

Write various .json files with the experiments you want to perform.
Run the experiments using the comand python code/run.py --hyper path_to_the_folder

Execute the experiments performed in the paper

In the hyperparameters folder there is one folder for each of the tested datasets.

If the user desires to run every experiment at the same time use the all_runs folder. Otherwise it can run the experiments by folder individually achieving the same results as the ones presented in the paper.

Keep in mind the experiments with the binary_crossentropy and sparse_categorical_crossentropy are kept in a seperate folder as they require Y array to be created differently. You can run them seperatly and then join the csv results.

With the experiments performed the results should be presented in results/raw folder.

To preprocess them run python code/results_preprocess.py which should create the results/final folder with the preprocessed results.

After that to obtain the importance of the hyperparameters run python code/results_analysis.py which should present the importance by dataset and the average of the six datasets.

Results

Here we present the results that are available in the paper and an additional analysis of the obtained results.

If there is any analysis missing that the reader might desire to perform, the complete data obtained from the runs is available in the results folder, or the reader might run the experiments him self.

Hyperparameter importance

These are the results of the fANOVA analysis.

General Importance

	All Datasets
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	18.42	3.2	6.99
batch_size	0.95	55.94	37.67
loss	12.23	0.33	2.1
optimizer	14.88	5.17	2.16
learning_rate	17.65	3.38	1.34
hidden_layer_dim	3.94	3.85	16.62
hidden_layer_size	3.94	3.61	6.29

Importance by dataset type

	Classification
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	17.59	2.91	2.28
batch_size	1.31	57.3	37.43
loss	9.16	0.01	3.76
optimizer	17.11	3.78	4.53
learning_rate	21.4	4.69	0.01
hidden_layer_dim	6.13	0.67	19.37
hidden_layer_size	3.04	5.2	8.54

	Regression
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	23.66	2.87	15.98
batch_size	4.49	64.87	37.22
loss	19.51	0.12	0.01
optimizer	7.4	8.33	0.12
learning_rate	18.09	1.38	3.26
hidden_layer_dim	2.1	2.2	12.22
hidden_layer_size	3.32	1.48	4.18

Importance per dataset

	Abalone
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	14.77	1.39	4.39
batch_size	0.55	56.72	21.61
loss	0.0	1.62	0.0
optimizer	2.96	7.99	3.5
learning_rate	30.02	6.9	0.07
hidden_layer_dim	7.16	0.12	15.69
hidden_layer_size	11.55	4.35	11.04

	Bike Sharing
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	51.26	0.59	24.54
batch_size	0.74	72.21	29.71
loss	0.06	0.0	0.0
optimizer	17.86	6.28	0.02
learning_rate	11.6	5.17	7.14
hidden_layer_dim	0.0	1.98	14.41
hidden_layer_size	2.62	1.16	0.82

	Compas
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	3.4	0.4	0.08
batch_size	1.16	43.0	6.23
loss	33.98	0.19	0.0
optimizer	21.68	4.02	4.16
learning_rate	9.59	6.06	0.02
hidden_layer_dim	0.76	2.92	49.31
hidden_layer_size	3.61	7.49	20.06

	Covertype
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	29.22	12.77	4.01
batch_size	0.77	56.92	41.6
loss	0.06	0.0	10.34
optimizer	8.29	1.65	4.67
learning_rate	23.64	0.32	0.17
hidden_layer_dim	13.27	0.2	3.32
hidden_layer_size	1.84	4.79	0.62

	Delays Zurich
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	0.37	3.57	5.2
batch_size	0.0	58.2	57.82
loss	39.27	0.0	0.01
optimizer	14.39	2.42	0.0
learning_rate	0.18	0.58	0.5
hidden_layer_dim	2.37	10.18	12.22
hidden_layer_size	3.81	0.48	3.92

	Higgs
Hyperparameter	Performance	Training Time	Inference Time
activation_functions	11.51	0.49	3.73
batch_size	2.46	48.6	69.07
loss	0.01	0.14	2.25
optimizer	24.08	8.67	0.63
learning_rate	30.84	1.22	0.15
hidden_layer_dim	0.09	7.68	4.75
hidden_layer_size	0.18	3.39	1.25

Performance metrics

Best performing hyperparameter combination per dataset

Activation function	Batch size	Hidden layer dimension	Loss function	Optimizer	Learning Rate	MSE/MCC	Training time	Prediction Time
	Regression
				Abalone
relu	256	[224, 192, 608, 768, 800]	mean_squared_error	adam	0.001	2.158	1.928	0.107
				Bike Sharing
selu	1024	[352, 32, 288, 32, 544, 704, 96]	mean_squared_error	adam	0.001	59.748	3.621	0.128
				Delays Zurich
relu	128	[640, 416, 576, 192, 288, 32, 32]	mean_squared_error	adam	0.001	3.101	73.694	0.286
	Classification
				Compass
relu	512	[512, 512, 512, 512]	categorical_crossentropy	adam	0.001	0.041	1.567	0.118
				Covertype
relu	512	[1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024]	categorical_crossentropy	adam	0.001	0.828	74.544	0.199
				Higgs
softsign	512	[224, 480, 64, 96, 768, 32, 928]	categorical_crossentropy	adam	0.001	0.415	50.935	0.239

Baseline vs Best vs Worst comparison

The best and worst models were picked based on the performance metric

Dataset	Baseline	Best model	Worst model
	Performance	(MCC/MSE)
	Regression
Abalone	2.289	2.158	9.295
Bike Sharing	84.045	59.748	100.139
Delays Zurich	3.107	3.101	154.627
	Classification
Compass	0.022	0.041	0
Covertype	0.812	0.828	-0.001
Higgs	0.256	0.415	0
	Training Time
Abalone	1.465	1.928	2.554
Bike Sharing	4.67	3.621	3.014
Delays Zurich	12.74	73.694	7.25
Compass	1.088	2.342	1.121
Covertype	37.381	74.544	4.987
Higgs	21.161	50.935	4.329
	Inference Time
Abalone	0.11	0.107	0.101
Bike Sharing	0.132	0.128	0.122
Delays Zurich	0.136	0.286	0.149
Compass	1.088	0.11	1.121
Covertype	0.173	0.199	0.172
Higgs	0.173	0.239	0.182

Authors

Rafael Teixeira - rgtzths

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

If you use this code, please cite our work: Teixeira, Rafael & Antunes, Mário & Sobral, Rúben & Martins, João & Gomes, Diogo & Aguiar, Rui. (2023). Exploring the Intricacies of Neural Network Optimization. 10.1007/978-3-031-45275-8_2.

DOI

About

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

deep-neural-networks hyperparameter-importance hyperparameter-optimization hyperparameter-search hyperparameter-tuning

MIT License

Languages

Language:Python 100.0%