implement parameters (for now ignored): verbose
, nesterovs_momentum
(!) Initialize weights by random values near zero
(!) Momentum
(!) Divide the gradient by n_samples
(!) stop training after an excessive number of epochs
(!) shuffle the training patterns.
(!) Try a number of random starting configurations (e.g. 10 or more training runs or trials).
(!) Check the learning curve
(!) table and plots
(!) Regularization
implemented parameters: hidden_layer_sizes
, hidden_activation
, output_activation
, alpha
, batch_size
, max_iter
, shuffle
, warm_start
, momentum
, loss
, solver
, random_state
, learning_rate
, learning_rate_init
, power_t
, tol
, n_iter_no_change
, early_stopping
, validation_fraction
ignored parameters: beta_1
, beta_2
, epsilon
, max_fun