Question about the input to compute SymbolicAggregateApproximation
ivan-marroquin opened this issue · comments
Hi,
Thanks for making this great package available!
The input data is expected to have "n samples" x "n time stamps" and be univariate time series. If I have only one time series, and I used the SymbolicAggregateApproximation as follows:
a) First scenario
X= np.array([0, 4, 2, 1, 7, 6, 3, 5]).reshape(-1,1)
transformer = SymbolicAggregateApproximation()
print(transformer.transform(X))
I get this result:
home/ivan_phd/python_3.9.0/lib/python3.9/site-packages/pyts/preprocessing/discretizer.py:168: UserWarning: Some quantiles are equal. The number of bins will be smaller for sample [0 1]. Consider decreasing the number of bins or removing these samples.
warn("Some quantiles are equal. The number of bins will "
[['a']
['a']
['a']
['a']
['a']
['a']
['a']
['a']]
b) Second scenario
X= np.array([[0, 4, 2, 1, 7, 6, 3, 5], [0, 4, 2, 1, 7, 6, 3, 5]])
transformer = SymbolicAggregateApproximation()
print(transformer.transform(X))
I get this result:
[['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']
['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']]
My questions are:
- Do I need to duplicate the time series to get the expected result?
- What is the meaning of 'n time stamps' for input data?
Thanks,
Ivan
Hi,
-
What you want is scenario A but the reshaping is wrong: if you have only one time series (i.e., one sample), you need to reshape your 1D array as a 2D array with one row:
X = np.array([0, 4, 2, 1, 7, 6, 3, 5]).reshape(1, -1)
-
n_timestamps
is the number of time points (values) in each time series. In your example, your time series has 8 values (n_timestamps=8
)
This convention is used because one needs a set of samples (and not just one sample) to perform machine learning, which is why the input is assumed to be a set of univariate time series (2D array).
Hope this helps you a bit and do not hesitate to ask more questions if needed.
Best,
Johann
Thanks for your quick response.
Ivan