[ENH] randomization/derandomization tag and conditional test logic

Question

[ENH] randomization/derandomization tag and conditional test logic

fkiraly opened this issue 2 months ago · comments

There are now many estimators in the code base which are excepted form the tests that check reproducibility when calling fit - test_fit_idempotent, this checks whether predict after fit on the same data - with random_state set - leads to the same results.

The reason that these estimators are excepted is that they are either non-deterministic, and have no random_state parameter to derandomize, or they have the random_state parameter but it is not fully derandomizing the estimator.

Instead of excepting more and more, I think we should deal with this situation with a tag, and for some values skip the test_fit_idempotent test.

The four situations are:

estimator does not have random_state parameter, is deterministic
estimator does not have random_state parameter, is stochastic
estimator does have random_state parameter, setting it derandomizes it
estimator does have random_state parameter, setting it leaves it still stochastic (only partially derandomizes)

We could address these by a single or multiple tags:

a single tag with four values, e.g., the tag called "randomness", values "deterministic", "stochastic", and some other descriptive names we need to come up with
a tag "capability:derandom", combined with "property:stochastic" (true/false); if not stochastic, derandom is always true

FYI @benHeid, @yarnabrina due to estimators you've recently looked at; FYI @jmwhyte due to the discussion on random_state we had a while ago.

Franz Király · Answer 1 · Mon Apr 08 2024 08:42:56 GMT+0800 (China Standard Time)

related: sktime/skbase#279

Franz Király · Answer 2 · Wed Apr 17 2024 03:27:48 GMT+0800 (China Standard Time)

related: #6287