Snippets Unit Test Assertion Failure
seanlaw opened this issue · comments
It appears that snippets unit tests are failing here and it's not clear if it may be related to the snippet comment in #828
It happened here as well
edit
Additional cases:
https://github.com/TDAmeritrade/stumpy/actions/runs/5633298262/job/15262019886
@NimaSarajpoor The max absolute difference seems to be rather large. Are you able to take a look?
The max absolute difference seems to be rather large.
Yes, I noticed it as well.
Are you able to take a look?
I will take a look. It has been in my radar.
# A failure case
m = 10
k = 3
s = 3
seed = 332
np.random.seed(seed)
T = np.random.uniform(-1000.0, 1000.0, [64])
# in tests/test_snippets.py
test_mpdist_snippets_s_with_isconstant(T, m, k, s)
[WIP]
In spite of having big difference between ref and cmp in the test fucntion, this might be related to the loss of precision!
The snippet index is computed based on the profile_areas
. At some point, the array profile_areas
from the naive version (ref) has the following values:
[0.07453704912719797, 0.07453704912719797, ...]
And, the array computed from the performant version has the following values:
[0.07453704912853336, 0.07453704912852163, ....]
The value of the snippet index is np.argmin(profile_areas)
. In the first case, the snippet index becomes 0
. In the second case, the performant version, it becomes 1
. Note that this small change results in a big difference in the snippets.
>>> np.max(np.abs(T[0:0+m] - T[10:10+m]))
1546.55831117
which is the "Max absolute difference" we get for this case:
AssertionError:
Arrays are not almost equal to 5 decimals
Mismatched elements: 10 / 30 (33.3%)
Max absolute difference: 1546.55831117
Next task: To understand where that loss of precision comes from!
Thank you!
@seanlaw
[I couldn't update the original post, so I am saying it here]
Another assertion failure similar to what mentioned in the original post: https://github.com/TDAmeritrade/stumpy/actions/runs/5633298262/job/15262019886
update
After further investigation, it seems that the issue is not related to #677.
Previously, we showed that the problem is coming from a very small loss of precision in the elements of the array profile_areas
. I noticed that:
(1) This loss of precision is actually coming from several loss of precision. For instance, the element profile_areas[0]
is the sum of elements of the array arr = np.minimum(D[0], Q)
. The loss of precision in profile_areas[0]
is caused by several, very-small loss of precision in the elements of the array arr
.
(2) This loss of precision shows up even before getting the assertion failure. For instance, let's say the ref array is [1, 1, 2]
. Its np.argmin
is 0. Now, let's say the cmp array, that contains loss of precision, is [0.998, 0.99, 1.98]
. Note that np.argmin
still results in 0
(so all good!). Now, what if we get [0.999, 0.998, 1.99]
because of the loss of precision? Now np.argmin
becomes 1
.
Going deep
I tried to dig deeper and I noticed one part of the loss of precision is caused by the fact that the mpdist distance between A and B are sometimes not equal to the mpdist distance between B and A, due to the loss of precision.
The function snippets._get_all_profiles
computes the distance matrix D
Line 114 in 1ddb950
So, D[i, j]
is the mpdist distance between T[i * m: i * m + m]
and T[j : j + m]
, using window size s
. Note that i
is in [0, m, 2*m, ...]
, and j
is in range(len(T) - m + 1)
. Therefore:
D[i, j]
should be equal to theD[j // m, i * m]
IFj % m == 0