sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[MNT] lack of test coverage of `pandas 2.2.X` *and* deep learning backends

fkiraly opened this issue · comments

As remarked by @yarnabrina on discord, current tests lack coverage of estimators that require deep learning backends, on pandas 2.2.X.

This is due to neural network backends only being present in test-full, which contains packages that imply pandas < 2.2.X.

Conversely, on the short module tests where pandas is not restricted by other dependencies, deep learning backends do not get installed.

How to best solve this, @sktime/core-developers ?

Possible solutions:

  • add the dl depset in modules job - this would only work if the dl dep set does not imply pandas < 2.2.X
  • add another batch of test with the cross-condition
  • estimator specific environments...

This is due to neural network backends only being present in test-full, which contains packages that imply pandas < 2.2.X.

I may have not explained correctly, but this is not what I meant. What I meant is currently dl extra is only tested in test-full job of old CI, where all_extras_pandas2 and dev are also installed. As a result of this massive set of packages (and their dependencies), it's now resolving to 2.1.4. It may be done by dl packages alone, but that I have not checked myself.

With more packages to be added in dl soon (transformers from Hugging Face PR), this is only going to add more restrictions. I think the only way forward will be the per estimator test idea what @fkiraly proposed earlier (can't find the issue) to ensure full coverage. However that seems really difficult, as Github has restrictions on number of jobs being created from a matrix (256 I think) and it's surely going to hit that limit (definitely considering OS and python versions, but probably on the number of estimators alone).

Another possibility, but have not thought out pros/cons/possibility yet:

  1. run script to identify all modified estimators (current subset logic to identify when to run a test)
  2. identify dependencies of these estimators
  3. create a job matrix of unique dependencies (e.g. statsmodels, neuralforecast, huggingface+chronos, etc.)
  4. in each job, run all the tests on modified estimators across several OS/Python versions

I may have not explained correctly, but this is not what I meant

I think we do mean the same thing, I think my formulation was unclear. When I said

"This is due to neural network backends only being present in test-full, which contains packages that imply pandas < 2.2.X."

the "which" was referencing the test-full environment rather than the neural network backends (about which I do not know whether they imply bounds on pandas.

Can you confirm whether you now think that we mean the same, or not?

can't find the issue

I had the same problem and was expecting that you would link it 😁

Now, I looked again and this time I used the tag "testing". Good that I keep tagging issues, it was the third one from top:
#5719

Another possibility, but have not thought out pros/cons/possibility yet:

How would that work, mechanically?
Do we nead a CLI for test collection?

Either way, with the current logic, we need to do env setup, then run python code, the set up env depending on the output of that, then run tests in that second env.

I would not know how to do this, even though I probably could find out after a few days of research.

Can you confirm whether you now think that we mean the same, or not?

Yes, I meant same as yours.

How would that work, mechanically?

I am not fully clear either, but the idea depends on the assumption of creating dynamic jobs based on new JSON etc. files. In Gitlab, it's possible to create new children with custom specifications and configurations based on previous steps and their artefacts, so I am hoping Github actions has that too.

If this is indeed possible, rest is pretty straightforward I think. The rest steps will be configured by OS name, python version and soft dependency (may be more than 1) to install and will be very similar to current flow.

@MEMEO-PRO @sammychinedu2ky @Xinyu-Wu-0000 @duydl since all of you helped in CI discussions before, any suggestions if it's feasible or not?

Sorry, was not active last few days. @fkiraly then I think we can try this as per @Xinyu-Wu-0000's suggestion? This is a high level implementation idea:

  • run script to identify all modified estimators (current subset logic to identify when to run a test)
  • identify dependencies of these estimators
  • create a job matrix of unique dependencies (e.g. statsmodels, neuralforecast, huggingface+chronos, etc.)
  • in each job, run all the tests on modified estimators across several OS/Python versions

The tricky part is step 3, which is where we will try the above suggestion. We can create a JSON of different estimators and the dependencies, and then that JSON will create the matrix.

Hm, so that would require:

  • specifying the format for the json
  • writing python code that creates the json? (that's the only way that I can see which does not introduce substantial manual maintenance burden)

Yes. Hopefully @Xinyu-Wu-0000 can help with the JSON format, and then it should be easy to create at the end of the python script that detects "affected" estimators and their dependencies.

Maybe this will work:

{
    "include": [
        {
            "test alias": "foo",
            "estimators": [
                "NeuralForecastRNN"
            ],
            "dependencies": [
                "neuralforecast==1.7.0",
                "statsmodels==0.14.1"
            ]
        },
        {
            "test alias": "bar",
            "estimators": [
                "SimpleRNNRegressor"
            ],
            "dependencies": [
                "tensorflow"
            ]
        }
    ]
}

dependencies should be easy to get for an estimator if we can use python - I wrote the utility deps from registry with precisely this in mind.