jma127 / pyltr

Python learning to rank (LTR) toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to use GridSearchCV module in sklearn to tune parameters?

jxfruit opened this issue · comments

Hi, tks for your work which is very significant. As I said above, how can I use GridSearchCV module in sklearn to tune parameters automatically? I looked up some ways, for example, inheriting classed BaseEstimator and RegressorMixin in module _modle.py. But I met a problem, the following are error infos:

File "E:/Python projects/others/lambdamart_t.py", line 105, in training_model
gscv.fit(training_set[1], training_set[2], groups=training_set[0])
File "E:\Python35\lib\site-packages\sklearn\model_selection_search.py", line 638, in fit
cv.split(X, y, groups)))
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 779, in call
while self.dispatch_one_batch(iterator):
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 625, in dispatch_one_batch
self._dispatch(tasks)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 111, in apply_async
result = ImmediateResult(func)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 332, in init
self.results = batch()
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in call
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "E:\Python35\lib\site-packages\sklearn\model_selection_validation.py", line 437, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
TypeError: fit() missing 1 required positional argument: 'qids'

I also implemented methods get_params() and set_params() in module lambdamart.py, however, the same error coming again as the above, and the implementing details are following codes:

 def get_params(self, deep=True):

    return {'metric': self.metric ,
            'learning_rate': self.learning_rate,
            'n_estimators': self.n_estimators,
            'query_subsample': self.query_subsample ,
            'subsample': self.subsample,
            'min_samples_split': self.min_samples_split,
            'min_samples_leaf': self.min_samples_leaf,
            'max_depth': self.max_depth,
            'random_state': self.random_state,
            'max_features': self.max_features,
            'verbose': self.verbose,
            'max_leaf_nodes': self.max_leaf_nodes,
            'warm_start': self.warm_start}

def set_params(self, **params):
    """Sets the parameters of this estimator.
            # Arguments
                **params: Dictionary of parameter names mapped to their values.
            # Returns
                self
    """
    for parameter, value in params.items():
        setattr(self, parameter, value)
    return self

my calling method is that
gscv = GridSearchCV(pyltr.models.LambdaMART(), params_lst, scoring=pyltr.metrics.AUCROC.calc_mean, n_jobs=1, cv=5, verbose=1)
gscv.fit(training_set[1], training_set[2], groups=training_set[0])
the training_set is a 3-tuple including training_qids, training_data and training_labels in sequence, which are all arrays.
Looking forword your reply, tks vaery much!!!

It seems that the extra qids parameter is not supported by this grid search class.