Update documentation regarding Euclidian distance instead of Pearson's correlation
lucaspg96 opened this issue · comments
According to the docs, the stumpy.stump
method:
Compute the z-normalized matrix profile
This is a convenience wrapper around the Numba JIT-compiled parallelized
`_stump` function which computes the matrix profile according to STOMPopt with
Pearson correlations.
However, computing the matrix profile over a time series (from scipy.misc import electrocardiogram
) is returning only positive values and they are bigger than 1. I boxplotted them:
@lucaspg96 Thank you for your question. I can understand the confusion. However, the emphasis of "Pearson correlation" is relating to the "STOMPopt" algorithm, which actually first computes the Pearson correlation before ultimately converting this to a z-normalized Euclidean distance. As opposed to the "STOMP" algorithm, which computes the z-normalized Euclidean distance directly without computing the Pearson correlation first.
Fundamentally, a "matrix profile" contains some sort of distance but it depends on the value of the normalize
and/or p
parameters (i.e., it isn't always z-normalized Euclidean distance, though, that is the default).
Perhaps it would be helpful to say:
Compute the matrix profile (default z-normalized Euclidean distance)
@lucaspg96 Are you able to provide some feedback?