johannfaouzi / pyts

A Python package for time series classification

Home Page:https://pyts.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the 'strategy' parameter in SymbolicAggregateApproximation()

GiovannaR opened this issue · comments

Description

Hi!

I am in doubt about the application of SymbolicAggregateApproximation() in comparison of its describition in the article "Experiencing SAX: a novel symbolic representation of time series". In the article, in section "3.2 Discretization", it is described that the data follows a Gaussian Distribution and the "breakpoints" are created to produce equal-sized areas under the curve of a Gaussian. So, I understand that the parameterer strategy='normal' uses the same strategy as the article, right? So, what a about the uniform and quantile strategies? Are they a change from the article?

Thank you for your help! Have a nice day!

Hi,

Indeed, the parameter strategy='normal' uses the same strategy as the article (quantiles from the standard normal distribution). The justification of using quantiles from the standard normal distribution is given the article:

"[...] the normalized time series have highly Gaussian distribution [...].

strategy='uniform' and strategy='quantile' actually use the values of the time series. strategy='uniform' creates bins of the same length (it uses the minimum and maximum values of the time series and creates K bins of same length). strategy='quantile' is similar to strategy='normal' but instead of using the quantiles of standard normal distribution, it uses the quantiles of the time series (so that all the symbols have (almost) the same number of occurrences).

It should be noted that the dimensionality reduction with Piecewise Aggregate Approximation is not included in this implementation, so you should use pyts.approximation.PiecewiseAggregateApproximation first (if you want to). To standardize time series, you can use pyts.preprocessing.StandardScaler (it is assumed that the time series are standardized to use strategy='normal').

Best,
Johann

Hi @johannfaouzi

Thanks for your comments. I have a follow-up question regarding the normalization of a time series. Do we really need to normalize the time series prior to computing the SymbolicAggregateApproximation?

Ivan

Hi,

It will depend on the 'strategy' used to discretize the time series with SymbolicAggregateApproximation:

  • 'normal' uses quantiles from the standard normal distribution, so it is assumed that the time series is standardized (zero mean, unit variance): any normalization may have an impact.
  • 'quantile' is invariant to any strictly increasing transformation because the order of the values will remain identical: normalization has no impact
  • 'uniform' is invariant to any strictly increasing linear transformation, so most common normalization techniques have no impact.
>>> import numpy as np
>>> from pyts.approximation import SymbolicAggregateApproximation
>>> from pyts.datasets import load_gunpoint

>>> X, _, _, _ = load_gunpoint(return_X_y=True)

>>> sax_normal = SymbolicAggregateApproximation(strategy='normal')
>>> np.alltrue(sax_normal.transform(X) == sax_normal.transform(2 * X + 6))
False
>>> np.alltrue(sax_normal.transform(X) == sax_normal.transform(np.exp(X))
False

>>> sax_quantile = SymbolicAggregateApproximation(strategy='quantile')
>>> np.alltrue(sax_quantile.transform(X) == sax_quantile.transform(2 * X + 6))
True
>>> np.alltrue(sax_quantile.transform(X) == sax_quantile.transform(np.exp(X)))
True

>>> sax_uniform = SymbolicAggregateApproximation(strategy='uniform')
>>> np.alltrue(sax_uniform.transform(X) == sax_uniform.transform(2 * X + 6))
True
>>> np.alltrue(sax_uniform.transform(X) == sax_uniform.transform(np.exp(X)))
False

Best,
Johann

Hi @johannfaouzi

Thanks for the explanations

Ivan