[ENH] randomization/derandomization tag and conditional test logic
fkiraly opened this issue · comments
There are now many estimators in the code base which are excepted form the tests that check reproducibility when calling fit
- test_fit_idempotent
, this checks whether predict
after fit
on the same data - with random_state
set - leads to the same results.
The reason that these estimators are excepted is that they are either non-deterministic, and have no random_state
parameter to derandomize, or they have the random_state
parameter but it is not fully derandomizing the estimator.
Instead of excepting more and more, I think we should deal with this situation with a tag, and for some values skip the test_fit_idempotent
test.
The four situations are:
- estimator does not have
random_state
parameter, is deterministic - estimator does not have
random_state
parameter, is stochastic - estimator does have
random_state
parameter, setting it derandomizes it - estimator does have
random_state
parameter, setting it leaves it still stochastic (only partially derandomizes)
We could address these by a single or multiple tags:
- a single tag with four values, e.g., the tag called
"randomness"
, values"deterministic"
,"stochastic"
, and some other descriptive names we need to come up with - a tag
"capability:derandom"
, combined with"property:stochastic"
(true/false); if not stochastic, derandom is always true
FYI @benHeid, @yarnabrina due to estimators you've recently looked at; FYI @jmwhyte due to the discussion on random_state
we had a while ago.
related: sktime/skbase#279
related: #6287