crate / cratedb-examples

A collection of clear and concise examples how to work with CrateDB.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AutoML: CI trips with `ValueError: Input contains NaN.`

amotl opened this issue · comments

Originally coming from an issue that mixed things up, GH-170, let's get things straight here.

Problem

CI on the AutoML job occasionally trips like this, failing the CI run.

FAILED test.py::test_file[automl_timeseries_forecasting_with_pycaret.py] - ValueError: Input contains NaN.
self = <joblib.parallel.BatchCompletionCallBack object at 0x7f4f737cb910>

    def _return_or_raise(self):
        try:
            if self.status == TASK_ERROR:
>               raise self._result
E               ValueError: Input contains NaN.

-- https://github.com/crate/cratedb-examples/actions/runs/7884792002/job/21514554253#step:6:1146

Outlook

@andnig shared his suggestions at #170 (comment) already. Maybe you can add them here instead?

Recommendation

@andnig suggested:

To go forward, you could use a different model for the test run, one which has less MASE.

Thanks!

Rationale

If I look at the failed run, I see the the esm model has an incredibly high MASE and RMSSE. This mostly indicates that the model is not very well suited for the data. I suggested it, as it is very lightweight, but well, too lightweight as it seems 😓

Untitled

Hi again. GH-300 makes it so to exclusively use a single model, "ets_cds_dt". Unfortunately, it still trips on CI.

Wasn't the script about using 3 models? I think the later benchmarking operations need at least 3 models, don't they?
Using 1 model without adjusting the later call will probably cause the trainers to fail.
But you'd also see this locally, not only on CI.

Ah all right. That looks like I didn't know what I was doing at all. Thanks!

Currently, we see no problems on CI in this regard. Therefore, I am closing the issue. Thanks for your support, @andnig!