[BUG] `MSTL.inverse_transform` fails if `return_components=False`
eangius opened this issue · comments
Describe the bug
Sktime is a great library thanks. Perhaps this is user error, but running the MSTL
to remove multiple seasonalities from endogenous variables as a standalone component or within a regular pipeline works as expected but fails to generate predictions when within a TransformedTargetForecaster
.
To Reproduce
from sktime.forecasting.naive import *
from sktime.forecasting.compose import *
from sktime.transformations.series.detrend import *
from sktime.datasets import load_airline
pipe = ForecastingPipeline(steps=[
('y', TransformedTargetForecaster(steps=[
('decompose', MSTL(periods=12, return_components=False)),
('forecaster', NaiveForecaster()),
]))
]).fit(y=load_airline(), fh=[1, 2, 3])
y_pred = pipe.predict() # exception here
Expected behavior
No exception at predict time. Seems like something is returning a pd.Series
& something downstream is expecting a pd.DataFrame
Additional context
Traceback (most recent call last):
File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 575, in _get_axis_number
return cls._AXIS_TO_AXIS_NUMBER[axis]
KeyError: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevconsole.py", line 364, in runcode
coro = func()
File "<input>", line 12, in <module>
File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
y_pred = self._predict(fh=fh, X=X_inner)
File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 531, in _predict
return self.forecaster_.predict(fh, X)
File ".venv/lib/python3.9/site-packages/sktime/forecasting/base/_base.py", line 448, in predict
y_pred = self._predict(fh=fh, X=X_inner)
File “.venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 1053, in _predict
y_pred = self._get_inverse_transform(self.transformers_pre_, y_pred, X)
File ".venv/lib/python3.9/site-packages/sktime/forecasting/compose/_pipeline.py", line 149, in _get_inverse_transform
y = transformer.inverse_transform(y, X)
File ".venv/lib/python3.9/site-packages/sktime/transformations/base.py", line 738, in inverse_transform
Xt = self._inverse_transform(X=X_inner, y=y_inner)
File ".venv/lib/python3.9/site-packages/sktime/transformations/series/detrend/mstl.py", line 203, in _inverse_transform
row_sums = X.sum(axis=1)
File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6519, in sum
return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs)
File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12503, in sum
return self._min_count_stat_function(
File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 12486, in _min_count_stat_function
return self._reduce(
File ".venv/lib/python3.9/site-packages/pandas/core/series.py", line 6430, in _reduce
self._get_axis_number(axis)
File ".venv/lib/python3.9/site-packages/pandas/core/generic.py", line 577, in _get_axis_number
raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series
Versions
How odd.
Strictly speaking it is superfluous to wrap the TransformedTargetForecaster
in ForecatsingPipeline
, but that should not impact behaviour (it just should be ignored, because it is a single elemen tpipeline).
I can see the issue: it is the axis=1
argument. If return_components=True
, then the result will be a pd.DataFrame
because it is multivariate. If it is False
, then it is a pd.Series
.
From a methodological standpoint, the inverse works correctly only in the return_components=True
case, if the seasonal component is also forecast.
I see that one would expect that the seasonal component is continued periodically and added back.
So, returning X
if pd.Series
would remove the exception, but lead to unexpected behaviour, as the seasonal components are not added back.
Perchance, do you know, @eangius, is there an easy way to get an extrapolated form of all seasonal components in statsmodels
MSTL
? There should be?
Also FYI @luca-miniati who is the author and maintainer.
To clarify, I think inverse_transform
should do:
- if
return_components=True
, exactly what it does currently - in this case it needs forecasters for all seasonal components and the residual if used in a pipeline - if
return_components=False
, a naive periodic continuation should be made for seasonal components, and it should be added to the transformed values. This might be slightly challenging, given that the index seen in_inverse_transform
need not be contiguous with, or could intersect with, the index seen infit
.
updated the issue title - imo the root cause is that MSTL.inverse_transform
fails whenever return_components=False
there is also a wider issue, namely that inverse_transform
test coverage seems insufficient to detect this, which should be investigated.
Hi Franz, long time no see! I'd like to implement the functionality for return_components=False
.
Let me know if I understand the solution correctly:
- make predictions of the seasonal time series, using the provided fh
- add up all the values, and return as
pd.Series
And a clarifying question: why would the index seen in fit potentially not match the index of _inverse_transform
?
Thanks for the quick diagnostic @fkiraly. Unfortunately I’m still a knob at statsmodel
to tell how to extract all seasonal components..
For context, we are wrapping MSTL
into a TransformedTargetForecaster
because we have a previous processing step for the exogenous variables but that was not relevant to reproduce the problem.
As an extra bit of context, we tried with return_components=True
and filtering out the other columns in a FunctionTransformer to keep things univariate but that gave us a different type of exception..
@eangius, as possible workarounds for de/re-trending in a pipeline:
- you can pipeline multiple
Deseasonalizer
-s, likeDeseasonalizer(sp=24) * Deseasonalizer(sp=24*7) * my_forecaster
for daily and weekly (if your data is hourly - you can try
StatsforecastMSTL
, this is a forecaster that is optimized and with integrated MSTL, though with a heavier dependency footprint
Hi Franz, long time no see!
Nice to hear from you again, as well!
I'd like to implement the functionality for
return_components=False
.
Great, let me know if I can help.
Let me know if I understand the solution correctly:
make predictions of the seasonal time series, using the provided fh
add up all the values, and return as
pd.Series
Yes, this should happen when it is pipelined with a forecaster.
Though, the MSTL
estimator is a transformer, so the transformer needs to carry out the transformation steps only.
So, we need to take the indices in _inverse_transform
, and determine the periodic pattern implied by what was fitted on fit
.
And a clarifying question: why would the index seen in fit potentially not match the index of
_inverse_transform
?
If you work out what happens in a forecasting pipeline, the transformer gets the historic indices in fit
, e.g., 0, 1, 2, ..., 100, and the indices corresponding to the fh
in predict
, fore a fh
of 1, 2, 3, the X
in _inverse_transform
would have index 101, 102, 103.
If we have patterns of periodicities 3, 5, 7, denoting the indices of the periodic patterm by 3-0, 3-1, 3-2; 5-0, 5-1, ..., 5-4; 7-0, ..., 7-6, (dashes just for notation, not "minus") then for incides 101, 102, 103 we should forecast, for components, the indices 3-2, 3-0, 3-1; 5-1, 5-2, 5-3; 7-4, 7-5, 7-6.
(in python, we start counting with 0, so X-1 maps onto any index divisible without remainder by X)
I think this already must be done somewhere in transform
if return_components=True
?