Error with Sklearn hyper parameter search with Multivariateclassifier

Question

Error with Sklearn hyper parameter search with Multivariateclassifier

timlac opened this issue 2 years ago · comments

Description

I'm trying to run the Multivariate Classifier along with TimeSeriesForest within the sklearn Randomized search on hyper parameters.

However I get the following error:

ValueError: Invalid parameter n_estimators for estimator MultivariateClassifier(estimator=TimeSeriesForest()). Check the list of available parameters with estimator.get_params().keys().

Steps/Code to Reproduce

import numpy as np
from pyts.classification import TimeSeriesForest
from sklearn.model_selection import RandomizedSearchCV
from pyts.multivariate.classification import MultivariateClassifier

# some placeholder example data (observations, dimensions, time steps)
X = np.ones((246, 17, 1084))
y = np.ones((246))

n_estimators_values  = [int(x) for x in np.linspace(10, 500, num = 100)]

parameters = {'n_estimators': n_estimators_values
                  }

rf = MultivariateClassifier(TimeSeriesForest())

clf = RandomizedSearchCV(estimator           = rf, 
                         param_distributions = parameters,
                         scoring             = 'roc_auc_ovo_weighted',
                         verbose             = 1,
                         n_iter              = 5000,
                         random_state        = 27,
                         n_jobs              = -1
                        )

clf.fit(X,y)

Versions

Python 3.8.13
NumPy 1.22.3
SciPy 1.7.3
Scikit-Learn 1.0.2
Numba 0.56.0
Pyts 0.12.0

Johann Faouzi · Answer 1 · Mon Aug 22 2022 22:41:54 GMT+0800 (China Standard Time)

Hi,

I'm very sorry for the delayed response (vacations). Here is a stackoverflow post with a similar setting: a one-vs-rest classifier when one wants to try out several values for the hyperparameters of the base classifier. If you run rf.get_params(), you get the following dictionary:

>>> rf.get_params()
{'estimator__bootstrap': True,
 'estimator__ccp_alpha': 0.0,
 'estimator__class_weight': None,
 'estimator__criterion': 'entropy',
 'estimator__max_depth': None,
 'estimator__max_features': 'auto',
 'estimator__max_leaf_nodes': None,
 'estimator__max_samples': None,
 'estimator__min_impurity_decrease': 0.0,
 'estimator__min_samples_leaf': 1,
 'estimator__min_samples_split': 2,
 'estimator__min_weight_fraction_leaf': 0.0,
 'estimator__min_window_size': 1,
 'estimator__n_estimators': 500,
 'estimator__n_jobs': None,
 'estimator__n_windows': 1.0,
 'estimator__oob_score': False,
 'estimator__random_state': None,
 'estimator__verbose': 0,
 'estimator': TimeSeriesForest(),
 'weights': None}

So you can get and set the values of the base estimator by using the prefix estimator__. In your case, you have to replace

parameters = {'n_estimators': n_estimators_values}

with

parameters = {'estimator__n_estimators': n_estimators_values}

Hope that this answers your issue.

Best,
Johann