[BUG] `MSTL.inverse_transform` fails if `return_components=False`

Question

[BUG] `MSTL.inverse_transform` fails if `return_components=False`

eangius opened this issue a month ago · comments

Describe the bug
Sktime is a great library thanks. Perhaps this is user error, but running the MSTL to remove multiple seasonalities from endogenous variables as a standalone component or within a regular pipeline works as expected but fails to generate predictions when within a TransformedTargetForecaster.

To Reproduce

from sktime.forecasting.naive import *
from sktime.forecasting.compose import *
from sktime.transformations.series.detrend import *
from sktime.datasets import load_airline

pipe = ForecastingPipeline(steps=[
    ('y', TransformedTargetForecaster(steps=[
        ('decompose', MSTL(periods=12, return_components=False)),
        ('forecaster',  NaiveForecaster()),
    ]))
]).fit(y=load_airline(), fh=[1, 2, 3])
y_pred = pipe.predict()  # exception here

Expected behavior
No exception at predict time. Seems like something is returning a pd.Series & something downstream is expecting a pd.DataFrame

Additional context

    Traceback (most recent call last):
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 575, in _get_axis_number
        return cls._AXIS_TO_AXIS_NUMBER[axis]
    KeyError: 1
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
        coro = func()
      File "<input>", line 12, in <module>
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
        y_pred = self._predict(fh=fh, X=X_inner)
      File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 531, in _predict
        return self.forecaster_.predict(fh, X)
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
        y_pred = self._predict(fh=fh, X=X_inner)
      File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 1053, in _predict
        y_pred = self._get_inverse_transform(self.transformers_pre_, y_pred, X)
      File ".venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 149, in _get_inverse_transform
        y = transformer.inverse_transform(y, X)
      File ".venv/lib/python3.9/site-packages/sktime/transformations/base.py", line 738, in inverse_transform
        Xt = self._inverse_transform(X=X_inner, y=y_inner)
      File ".venv/lib/python3.9/site-packages/sktime/transformations/series/detrend/mstl.py", line 203, in _inverse_transform
        row_sums = X.sum(axis=1)
      File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6519, in sum
        return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs)
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12503, in sum
        return self._min_count_stat_function(
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12486, in _min_count_stat_function
        return self._reduce(
      File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6430, in _reduce
        self._get_axis_number(axis)
      File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
        raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
    ValueError: No axis named 1 for object type Series

Versions

Python dependencies: pip: 23.2.1 sktime: 0.28.0 sklearn: 1.4.2 skbase: 0.7.5 numpy: 1.26.4 scipy: 1.13.0 pandas: 2.2.1 matplotlib: 3.8.4 joblib: 1.4.0 numba: 0.59.1 statsmodels: 0.14.1 pmdarima: 2.0.4 statsforecast: 1.7.4 tsfresh: None tslearn: None torch: None tensorflow: None

Franz Király · Answer 1 · Wed May 08 2024 04:38:49 GMT+0800 (China Standard Time)

How odd.

Strictly speaking it is superfluous to wrap the TransformedTargetForecaster in ForecatsingPipeline, but that should not impact behaviour (it just should be ignored, because it is a single elemen tpipeline).

I can see the issue: it is the axis=1 argument. If return_components=True, then the result will be a pd.DataFrame because it is multivariate. If it is False, then it is a pd.Series.

From a methodological standpoint, the inverse works correctly only in the return_components=True case, if the seasonal component is also forecast.

I see that one would expect that the seasonal component is continued periodically and added back.

So, returning X if pd.Series would remove the exception, but lead to unexpected behaviour, as the seasonal components are not added back.

Perchance, do you know, @eangius, is there an easy way to get an extrapolated form of all seasonal components in statsmodels MSTL? There should be?

Also FYI @luca-miniati who is the author and maintainer.

Franz Király · Answer 2 · Wed May 08 2024 04:40:29 GMT+0800 (China Standard Time)

To clarify, I think inverse_transform should do:

if return_components=True, exactly what it does currently - in this case it needs forecasters for all seasonal components and the residual if used in a pipeline
if return_components=False, a naive periodic continuation should be made for seasonal components, and it should be added to the transformed values. This might be slightly challenging, given that the index seen in _inverse_transform need not be contiguous with, or could intersect with, the index seen in fit.

Franz Király · Answer 3 · Wed May 08 2024 04:41:21 GMT+0800 (China Standard Time)

updated the issue title - imo the root cause is that MSTL.inverse_transform fails whenever return_components=False

Franz Király · Answer 4 · Wed May 08 2024 04:42:02 GMT+0800 (China Standard Time)

there is also a wider issue, namely that inverse_transform test coverage seems insufficient to detect this, which should be investigated.

Luca Miniati · Answer 5 · Wed May 08 2024 04:57:47 GMT+0800 (China Standard Time)

Hi Franz, long time no see! I'd like to implement the functionality for return_components=False.

Let me know if I understand the solution correctly:

make predictions of the seasonal time series, using the provided fh
add up all the values, and return as pd.Series

And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform?

Elian Angius · Answer 6 · Wed May 08 2024 06:01:34 GMT+0800 (China Standard Time)

Thanks for the quick diagnostic @fkiraly. Unfortunately I’m still a knob at statsmodel to tell how to extract all seasonal components..

For context, we are wrapping MSTL into a TransformedTargetForecaster because we have a previous processing step for the exogenous variables but that was not relevant to reproduce the problem.

As an extra bit of context, we tried with return_components=True and filtering out the other columns in a FunctionTransformer to keep things univariate but that gave us a different type of exception..

Franz Király · Answer 7 · Wed May 08 2024 06:14:20 GMT+0800 (China Standard Time)

@eangius, as possible workarounds for de/re-trending in a pipeline:

you can pipeline multiple Deseasonalizer-s, like Deseasonalizer(sp=24) * Deseasonalizer(sp=24*7) * my_forecaster for daily and weekly (if your data is hourly
you can try StatsforecastMSTL, this is a forecaster that is optimized and with integrated MSTL, though with a heavier dependency footprint

Franz Király · Answer 8 · Wed May 08 2024 06:19:42 GMT+0800 (China Standard Time)

Hi Franz, long time no see!

Nice to hear from you again, as well!

I'd like to implement the functionality for return_components=False.

Great, let me know if I can help.

Let me know if I understand the solution correctly:

make predictions of the seasonal time series, using the provided fh

add up all the values, and return as pd.Series

Yes, this should happen when it is pipelined with a forecaster.

Though, the MSTL estimator is a transformer, so the transformer needs to carry out the transformation steps only.

So, we need to take the indices in _inverse_transform, and determine the periodic pattern implied by what was fitted on fit.

And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform?

If you work out what happens in a forecasting pipeline, the transformer gets the historic indices in fit, e.g., 0, 1, 2, ..., 100, and the indices corresponding to the fh in predict, fore a fh of 1, 2, 3, the X in _inverse_transform would have index 101, 102, 103.

If we have patterns of periodicities 3, 5, 7, denoting the indices of the periodic patterm by 3-0, 3-1, 3-2; 5-0, 5-1, ..., 5-4; 7-0, ..., 7-6, (dashes just for notation, not "minus") then for incides 101, 102, 103 we should forecast, for components, the indices 3-2, 3-0, 3-1; 5-1, 5-2, 5-3; 7-4, 7-5, 7-6.
(in python, we start counting with 0, so X-1 maps onto any index divisible without remainder by X)

I think this already must be done somewhere in transform if return_components=True?