TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Home Page:https://stumpy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update documentation regarding Euclidian distance instead of Pearson's correlation

lucaspg96 opened this issue · comments

According to the docs, the stumpy.stump method:

Compute the z-normalized matrix profile
    
    This is a convenience wrapper around the Numba JIT-compiled parallelized
    `_stump` function which computes the matrix profile according to STOMPopt with
    Pearson correlations.

However, computing the matrix profile over a time series (from scipy.misc import electrocardiogram) is returning only positive values and they are bigger than 1. I boxplotted them:

image

@lucaspg96 Thank you for your question. I can understand the confusion. However, the emphasis of "Pearson correlation" is relating to the "STOMPopt" algorithm, which actually first computes the Pearson correlation before ultimately converting this to a z-normalized Euclidean distance. As opposed to the "STOMP" algorithm, which computes the z-normalized Euclidean distance directly without computing the Pearson correlation first.

Fundamentally, a "matrix profile" contains some sort of distance but it depends on the value of the normalize and/or p parameters (i.e., it isn't always z-normalized Euclidean distance, though, that is the default).

Perhaps it would be helpful to say:

Compute the matrix profile (default z-normalized Euclidean distance)

@lucaspg96 Are you able to provide some feedback?