sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] logic problem with new CI - coverage of `TestAllEstimators` and `TestAllObjects` incorrect

fkiraly opened this issue · comments

I think there is a bug in the new CI. The coverage is not affected since "old CI" is still running.

The issue occurs when, say, a forecaster changes.

The new CI will then trigger tests in sktime.forecasting, but not in sktime.tests which contains TestAllEstimator.
So, we run the forecaster specific tests for the changed forecaster, but not the object and estimator specific tests.

FYI @yarnabrina, is there a way to fix this?
The "one liner" to "run all tests for estimator X" is check_estimator - however, this is intentionally avoided in the original setup as an entry point since this distributes less well across workers, via pytest-xdist.

The "opposite case" is where there are changes in "other" modules, as well as in a forecaster. The TestAllEstimators test for that forecaster then runs in the "other" test job, not in the "module/forecasters" bucket:
https://github.com/sktime/sktime/actions/runs/8861405811/job/24333294008?pr=6348

I just commented in Discord as well, this job is not coming from "other" case in new CI. It's coming from old CI relying on the Makefile, specifically this line is being called to run the tests:

run: make test_without_datasets

New CI jobs will have been called by this line instead:

run: >-
python3
-m pytest
sktime
--ignore sktime/base
--ignore sktime/datasets
--ignore sktime/alignment
--ignore sktime/annotation
--ignore sktime/classification
--ignore sktime/clustering
--ignore sktime/forecasting
--ignore sktime/networks
--ignore sktime/param_est
--ignore sktime/regression
--ignore sktime/transformations
--matrixdesign True
--only_changed_modules True

Sorry, wrong link:
https://github.com/sktime/sktime/actions/runs/8861405811/job/24334091449?pr=6348

This is coming from "other" in new CI, screenshot:
image

The solution could be, to run tests in all "module" tests, that will at least ensure that the estimator/object tests are run for changed forecasters.

We cannot turn them off in "other", as estimators may live in "other" modules. But that will trigger only if an "other" and a "module" estimator is changed in the same PR, so perhaps low incidence.

I've created PR #6353, please take a look. But based on sample runs on my laptop, this is going to make new CI significantly slower, even the test collection part. For example, the times before and after the change for forecasting with just test collection (--co flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).

Only test collection?

Before we merge then, can we diagnose where this is coming from?

Collection should certainly not take so long.

I just checked construction times with the code below. It looks like construction is not the culprit, most likely.
Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.

from sktime.utils.validation._dependencies import _check_estimator_deps
from sktime.registry import all_estimators

ests = all_estimators(return_names=False)

def _measure_init_time(cls, params=None):
    from time import time

    start = time()
    try:
        cls(**params)
    except Exception:
        pass
    end = time()
    return end - start

times = []

for est in ests:
    if _check_estimator_deps(est, severity="none"):
        params = est.get_test_params()
        if not isinstance(params, list):
            params = [params]
        for param in params:
            times.append((est, _measure_init_time(est, param)))

Just in case it's device or OS specific, what are the times for you for the above reported cases? I used these commands:

# pre-PR
python -m pytest sktime/forecasting --co

# post-PR
python -m pytest sktime/forecasting sktime/tests --co

I just checked construction times with the code below. It looks like construction is not the culprit, most likely.
Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.

I think construction or soft dependencies do not affect the above two commands I shared. I am not 100% confident though.

I ran the first commands on main, python 3.11, windows - I cancelled it after it was running for 10 minutes.

That is very odd. Last time we measured test collection time, it was 15sec (minimal soft deps), or 1min (many extras)

update:

  • 1st command: 661 secs (11 min)
  • 2nd runs >20min, terminated

my show_versions:

Python dependencies:
          pip: 23.3.1
       sktime: 0.28.1
      sklearn: 1.3.2
       skbase: 0.7.6
        numpy: 1.26.2
        scipy: 1.11.4
       pandas: 2.1.4
   matplotlib: 3.8.3
       joblib: 1.3.2
        numba: 0.59.1
  statsmodels: 0.14.0
     pmdarima: 2.0.4
statsforecast: None
      tsfresh: 0.20.2
      tslearn: None
        torch: None
   tensorflow: None
tensorflow_probability: None

what are your timings?

I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?

I think we are getting closer - I updated the issue #6344 with the problem description. I will now run a profiler on the test collection which takes too long.

For example, the times before and after the change for forecasting with just test collection (--co flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).

These are my times. But I wonder why's mine so much faster than yours!!

Idea: I don't have pytest-xdist or other plugins installed so disabled those in configuration (setup.cfg). Can you try that once?

I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?

Yes, it's configured in setup.cfg. My guess is it is for test execution, not for collection. Don't know how to check that.

That is very odd. 20 sec are more in line with the times from this older issue: #4900

should we move the test times discussion to here? #6344