sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] `MSTL.inverse_transform` fails if `return_components=False`

eangius opened this issue · comments

Describe the bug
Sktime is a great library thanks. Perhaps this is user error, but running the MSTL to remove multiple seasonalities from endogenous variables as a standalone component or within a regular pipeline works as expected but fails to generate predictions when within a TransformedTargetForecaster.

To Reproduce

from sktime.forecasting.naive import *
from sktime.forecasting.compose import *
from sktime.transformations.series.detrend import *
from sktime.datasets import load_airline

pipe = ForecastingPipeline(steps=[
    ('y', TransformedTargetForecaster(steps=[
        ('decompose', MSTL(periods=12, return_components=False)),
        ('forecaster',  NaiveForecaster()),
    ]))
]).fit(y=load_airline(), fh=[1, 2, 3])
y_pred = pipe.predict()  # exception here

Expected behavior
No exception at predict time. Seems like something is returning a pd.Series & something downstream is expecting a pd.DataFrame

Additional context

    Traceback (most recent call last):
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 575, in _get_axis_number
        return cls._AXIS_TO_AXIS_NUMBER[axis]
    KeyError: 1
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
        coro = func()
      File "<input>", line 12, in <module>
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
        y_pred = self._predict(fh=fh, X=X_inner)
      File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 531, in _predict
        return self.forecaster_.predict(fh, X)
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
        y_pred = self._predict(fh=fh, X=X_inner)
      File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 1053, in _predict
        y_pred = self._get_inverse_transform(self.transformers_pre_, y_pred, X)
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 149, in _get_inverse_transform
        y = transformer.inverse_transform(y, X)
      File ".venv/lib/python3.9/site-packages/sktime/transformations/base.py", line 738, in inverse_transform
        Xt = self._inverse_transform(X=X_inner, y=y_inner)
      File ".venv/lib/python3.9/site-packages/sktime/transformations/series/detrend/mstl.py", line 203, in _inverse_transform
        row_sums = X.sum(axis=1)
      File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6519, in sum
        return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs)
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12503, in sum
        return self._min_count_stat_function(
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12486, in _min_count_stat_function
        return self._reduce(
      File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6430, in _reduce
        self._get_axis_number(axis)
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
        raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
    ValueError: No axis named 1 for object type Series

Versions

Python dependencies: pip: 23.2.1 sktime: 0.28.0 sklearn: 1.4.2 skbase: 0.7.5 numpy: 1.26.4 scipy: 1.13.0 pandas: 2.2.1 matplotlib: 3.8.4 joblib: 1.4.0 numba: 0.59.1 statsmodels: 0.14.1 pmdarima: 2.0.4 statsforecast: 1.7.4 tsfresh: None tslearn: None torch: None tensorflow: None

How odd.

Strictly speaking it is superfluous to wrap the TransformedTargetForecaster in ForecatsingPipeline, but that should not impact behaviour (it just should be ignored, because it is a single elemen tpipeline).

I can see the issue: it is the axis=1 argument. If return_components=True, then the result will be a pd.DataFrame because it is multivariate. If it is False, then it is a pd.Series.

From a methodological standpoint, the inverse works correctly only in the return_components=True case, if the seasonal component is also forecast.

I see that one would expect that the seasonal component is continued periodically and added back.

So, returning X if pd.Series would remove the exception, but lead to unexpected behaviour, as the seasonal components are not added back.

Perchance, do you know, @eangius, is there an easy way to get an extrapolated form of all seasonal components in statsmodels MSTL? There should be?

Also FYI @luca-miniati who is the author and maintainer.

To clarify, I think inverse_transform should do:

  • if return_components=True, exactly what it does currently - in this case it needs forecasters for all seasonal components and the residual if used in a pipeline
  • if return_components=False, a naive periodic continuation should be made for seasonal components, and it should be added to the transformed values. This might be slightly challenging, given that the index seen in _inverse_transform need not be contiguous with, or could intersect with, the index seen in fit.

updated the issue title - imo the root cause is that MSTL.inverse_transform fails whenever return_components=False

there is also a wider issue, namely that inverse_transform test coverage seems insufficient to detect this, which should be investigated.

Hi Franz, long time no see! I'd like to implement the functionality for return_components=False.

Let me know if I understand the solution correctly:

  • make predictions of the seasonal time series, using the provided fh
  • add up all the values, and return as pd.Series

And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform?

Thanks for the quick diagnostic @fkiraly. Unfortunately I’m still a knob at statsmodel to tell how to extract all seasonal components..

For context, we are wrapping MSTL into a TransformedTargetForecaster because we have a previous processing step for the exogenous variables but that was not relevant to reproduce the problem.

As an extra bit of context, we tried with return_components=True and filtering out the other columns in a FunctionTransformer to keep things univariate but that gave us a different type of exception..

@eangius, as possible workarounds for de/re-trending in a pipeline:

  • you can pipeline multiple Deseasonalizer-s, like Deseasonalizer(sp=24) * Deseasonalizer(sp=24*7) * my_forecaster for daily and weekly (if your data is hourly
  • you can try StatsforecastMSTL, this is a forecaster that is optimized and with integrated MSTL, though with a heavier dependency footprint

Hi Franz, long time no see!

Nice to hear from you again, as well!

I'd like to implement the functionality for return_components=False.

Great, let me know if I can help.

Let me know if I understand the solution correctly:

  • make predictions of the seasonal time series, using the provided fh

  • add up all the values, and return as pd.Series

Yes, this should happen when it is pipelined with a forecaster.

Though, the MSTL estimator is a transformer, so the transformer needs to carry out the transformation steps only.

So, we need to take the indices in _inverse_transform, and determine the periodic pattern implied by what was fitted on fit.

And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform?

If you work out what happens in a forecasting pipeline, the transformer gets the historic indices in fit, e.g., 0, 1, 2, ..., 100, and the indices corresponding to the fh in predict, fore a fh of 1, 2, 3, the X in _inverse_transform would have index 101, 102, 103.

If we have patterns of periodicities 3, 5, 7, denoting the indices of the periodic patterm by 3-0, 3-1, 3-2; 5-0, 5-1, ..., 5-4; 7-0, ..., 7-6, (dashes just for notation, not "minus") then for incides 101, 102, 103 we should forecast, for components, the indices 3-2, 3-0, 3-1; 5-1, 5-2, 5-3; 7-4, 7-5, 7-6.
(in python, we start counting with 0, so X-1 maps onto any index divisible without remainder by X)

I think this already must be done somewhere in transform if return_components=True?