TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Home Page:https://stumpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Snippets Unit Test Assertion Failure

seanlaw opened this issue · comments

It appears that snippets unit tests are failing here and it's not clear if it may be related to the snippet comment in #828

It happened here as well

edit

Additional cases:
https://github.com/TDAmeritrade/stumpy/actions/runs/5633298262/job/15262019886

@NimaSarajpoor The max absolute difference seems to be rather large. Are you able to take a look?

The max absolute difference seems to be rather large.

Yes, I noticed it as well.

Are you able to take a look?

I will take a look. It has been in my radar.

# A failure case

m = 10
k = 3
s = 3

seed = 332
np.random.seed(seed)
T = np.random.uniform(-1000.0, 1000.0, [64])

# in tests/test_snippets.py
test_mpdist_snippets_s_with_isconstant(T, m, k, s)

[WIP]
In spite of having big difference between ref and cmp in the test fucntion, this might be related to the loss of precision!

The snippet index is computed based on the profile_areas. At some point, the array profile_areas from the naive version (ref) has the following values:
[0.07453704912719797, 0.07453704912719797, ...]

And, the array computed from the performant version has the following values:
[0.07453704912853336, 0.07453704912852163, ....]

The value of the snippet index is np.argmin(profile_areas). In the first case, the snippet index becomes 0. In the second case, the performant version, it becomes 1. Note that this small change results in a big difference in the snippets.

>>> np.max(np.abs(T[0:0+m] - T[10:10+m]))
1546.55831117

which is the "Max absolute difference" we get for this case:

AssertionError: 
Arrays are not almost equal to 5 decimals

Mismatched elements: 10 / 30 (33.3%)
Max absolute difference: 1546.55831117

Next task: To understand where that loss of precision comes from!

Thank you!

@seanlaw
[I couldn't update the original post, so I am saying it here]

Another assertion failure similar to what mentioned in the original post: https://github.com/TDAmeritrade/stumpy/actions/runs/5633298262/job/15262019886

update

After further investigation, it seems that the issue is not related to #677.

Previously, we showed that the problem is coming from a very small loss of precision in the elements of the array profile_areas. I noticed that:

(1) This loss of precision is actually coming from several loss of precision. For instance, the element profile_areas[0] is the sum of elements of the array arr = np.minimum(D[0], Q). The loss of precision in profile_areas[0] is caused by several, very-small loss of precision in the elements of the array arr.

(2) This loss of precision shows up even before getting the assertion failure. For instance, let's say the ref array is [1, 1, 2]. Its np.argmin is 0. Now, let's say the cmp array, that contains loss of precision, is [0.998, 0.99, 1.98]. Note that np.argmin still results in 0 (so all good!). Now, what if we get [0.999, 0.998, 1.99] because of the loss of precision? Now np.argmin becomes 1.

Going deep

I tried to dig deeper and I noticed one part of the loss of precision is caused by the fact that the mpdist distance between A and B are sometimes not equal to the mpdist distance between B and A, due to the loss of precision.

The function snippets._get_all_profiles computes the distance matrix D

D = np.empty(((n_padded // m) - 1, n_padded - m + 1), dtype=np.float64)

So, D[i, j] is the mpdist distance between T[i * m: i * m + m] and T[j : j + m], using window size s. Note that i is in [0, m, 2*m, ...], and j is in range(len(T) - m + 1). Therefore:

  • D[i, j] should be equal to the D[j // m, i * m] IF j % m == 0