TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Home Page:https://stumpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Functionality of stump not clear to me

Noskario opened this issue · comments

There are several issues to me, I will list them now:

  1. I have already written a stackoverflow question for this, see https://stackoverflow.com/questions/76729654/why-is-the-behaviour-of-stumpy-stump-changing-so-abruptly-why-is-it-unable-to-m To me this seems to be a bug
  2. stumpy.stump(T_A=np.array([0.,0,0,0,0,0,0]), m=4, T_B=np.zeros(30), k=10, ignore_trivial=True) produces the following warnings:

UserWarning: Arrays T_A, T_B are not equal, which implies an AB-join. ignore_trivial has been automatically set to False.
and
UserWarning: A large number of values in P are smaller than 1e-06.
For a self-join, try setting ignore_trivial=True.

So to me it seems the parameter ignore_trivial has no effect.

  1. The result of the above computation is
array([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
       5, 6, 7, 8, 9, -1, -1],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
       5, 6, 7, 8, 9, -1, -1],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
       5, 6, 7, 8, 9, -1, -1],
      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
       5, 6, 7, 8, 9, -1, -1]], dtype=object)

Is there a way to exclude the second best hits that overlap with the best hit? So instead of 0,1,2,3,... I would like to have something like 0,4,8,... (no overlap between the hits; recall that m=4).

  1. I have already written a stackoverflow question for this, see https://stackoverflow.com/questions/76729654/why-is-the-behaviour-of-stumpy-stump-changing-so-abruptly-why-is-it-unable-to-m To me this seems to be a bug

I have provided an answer on stackoverflow. However, feel free to continue the discussion for this part there, or post your question here.

stumpy.stump(T_A=np.array([0.,0,0,0,0,0,0]), m=4, T_B=np.zeros(30), k=10, ignore_trivial=True) produces the following warnings:

The parameter ignore_trivial==True is effective when we are dealing with self-join. Let's consider a AB-join (not BA-join) case, and let's consider the following two subsequences:

# m : window size

S_i  = T_A[i : i + m]
S_j  = T_B[j : j + m]

Now, the question is: how can the trivial zone (which is around S_i) be related to S_j, which is in another time series? That is why when the program detects that the two time series are not the same, it automatically sets the ignore_trivial to False. In a nutshell, it becomes meaningless to consider this parameter in AB-join.

The result of the above computation is
array([[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, -1, -1],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, -1, -1],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, -1, -1],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, -1, -1]], dtype=object)
Is there a way to exclude the second best hits that overlap with the best hit? So instead of 0,1,2,3,... I would like to have something like 0,4,8,... (no overlap between the hits; recall that m=4).

This is avoided in STUMPY as it requries a considerable amount of memory for long time series. If your time series is not that long, you may try to simply find the pairwise distance between ALL subsequences, and then use the distance matrix to get what you are looking for.

Thank you for your responses!

@Noskario Thank you for your question. In the future, where appropriate, please consider posting usage/API/understanding related questions to our Github Discussions section. There, you may engage in the broader user community (we monitor it closely as well). Our Github Issues are typically reserved for bug reports/feature requests/tracking code related issues. I understand that this one was unclear and there is no problem with bring it up here (i.e., you have done nothing wrong) and I just wanted to make sure that you were aware of the alternative space.