sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ENH] randomization/derandomization tag and conditional test logic

fkiraly opened this issue · comments

There are now many estimators in the code base which are excepted form the tests that check reproducibility when calling fit - test_fit_idempotent, this checks whether predict after fit on the same data - with random_state set - leads to the same results.

The reason that these estimators are excepted is that they are either non-deterministic, and have no random_state parameter to derandomize, or they have the random_state parameter but it is not fully derandomizing the estimator.

Instead of excepting more and more, I think we should deal with this situation with a tag, and for some values skip the test_fit_idempotent test.

The four situations are:

  • estimator does not have random_state parameter, is deterministic
  • estimator does not have random_state parameter, is stochastic
  • estimator does have random_state parameter, setting it derandomizes it
  • estimator does have random_state parameter, setting it leaves it still stochastic (only partially derandomizes)

We could address these by a single or multiple tags:

  • a single tag with four values, e.g., the tag called "randomness", values "deterministic", "stochastic", and some other descriptive names we need to come up with
  • a tag "capability:derandom", combined with "property:stochastic" (true/false); if not stochastic, derandom is always true

FYI @benHeid, @yarnabrina due to estimators you've recently looked at; FYI @jmwhyte due to the discussion on random_state we had a while ago.

related: #6287