TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Home Page:https://stumpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ostinato Overwrites Original Time Series

seanlaw opened this issue · comments

There appears to be a bug in the ostinato (and variants) algorithm where the list of time series being passed into the function are also (possibly, in some rare cases) being incorrectly overwritten after they have been core.preprocessed:

stumpy/stumpy/ostinato.py

Lines 372 to 375 in 9edd7f5

for i, T in enumerate(Ts):
Ts[i], M_Ts[i], Σ_Ts[i], Ts_subseq_isconstant[i] = core.preprocess(
T, m, T_subseq_isconstant=Ts_subseq_isconstant[i]
)

This isn't a problem when there are no np.nan/np.inf in the data since the time series that gets passed back will be identical to the input time series. However, in the rare case, when there are np.nan/np.inf in the time series, those parts of the time series get preprocessed into 0.0 and then this converted time series overwrites the original element within the input list of time series Ts.

What we should do is:

  1. make a copy of the input list, Ts, and also each time series, Ts[i], so that the original Ts/Ts[i] is NOT overwritten
  2. Add a unit test to check that the input is not overwritten in Ts (especially when there are np.nan/np.inf in the data

Something like:

Ts_copy = []
for T in Ts:
    Ts_copy.append(T.copy())

and then use Ts_copy everywhere and overwrite where necessary

#980 is blocked until this is resolved

@NimaSarajpoor I noticed that the unit tests that were added for this are adding a significant amount of time (~45 minutes) in our Github Workflow. Prior to this PR, the workflows were completing in around 45-55 minutes. Now, it is taking between 1.5-2 hours :(

Would you mind taking a look?