TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Home Page:https://stumpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

incorrect number of iterations in `snippets._get_all_mpdist_profiles`

NimaSarajpoor opened this issue · comments

Issue

In the function _get_all_mpdist_profiless, the time series T is extended so that to have a length that is a multiples of m.

stumpy/stumpy/snippets.py

Lines 103 to 108 in 1ddb950

right_pad = 0
T_subseq_isconstant = core.process_isconstant(T, s, mpdist_T_subseq_isconstant)
if T.shape[0] % m != 0:
right_pad = int(m * np.ceil(T.shape[0] / m) - T.shape[0])
pad_width = (0, right_pad)
T = np.pad(T, pad_width, mode="constant", constant_values=np.nan)

Note that the length of the original T does not change IF len(T) % m == 0. In such case, it seems that the following for-loop misses the last non-overlapping window of length m

stumpy/stumpy/snippets.py

Lines 113 to 123 in 1ddb950

n_padded = T.shape[0]
D = np.empty(((n_padded // m) - 1, n_padded - m + 1), dtype=np.float64)
M_T, Σ_T = core.compute_mean_std(T, s)
# Iterate over non-overlapping subsequences, see Definition 3
for i in range((n_padded // m) - 1):
start = i * m
stop = (i + 1) * m
S_i = T[start:stop]
D[i, :] = _mpdist_vect(

Solution

Before the if check if T.shape[0] % m != 0:, we may add: n_windows = T.shape[0] // m and then we do for-loop using range(n_windows)

@NimaSarajpoor In both cases (with and without padding), wouldn't it be sufficient to do:

for i in range(n_padded // m): 

So we omit the - 1

First, I should mention that my aim was to fix the shape of distance matrix D as well. D.shape[0] should be n_windows.

wouldn't it be sufficient to do:

I don't think it would since I am inclined to believe that the snippet needs to have full-size m. Therefore, if len(T) % m != 0, then the T[-r: ] (where r = len(T) % m) should not be considered as snippet.

Before the if check if T.shape[0] % m != 0:, we may add: n_windows = T.shape[0] // m and then we do for-loop using range(n_windows)

I accept your proposal. However, can we call it n_contiguous_windows ("contiguous" == "non-overlapping")?

Before the if check if T.shape[0] % m != 0:, we may add: n_windows = T.shape[0] // m and then we do for-loop using range(n_windows)

I accept your proposal. However, can we call it n_contiguous_windows ("contiguous" == "non-overlapping")?

Sure. n_contiguous_windows is more informative. Will make changes and submit a PR.