Run-length encoding example
marcus-voss opened this issue · comments
Thanks for the implementation. I'd be interested to use it in a project on sparse binary sequences. However, I'm not yet sure about your expected format. You provide these two arrays here as examples:
s = np.array([1, 2, 3, -1, 1])
t = np.array([1, -2, 4, 1])
However, I don't get how these match either the run-length encoding of the paper (then I'd expect numbers highlighted somehow to give an idea of the number of zeros) nor the encoding described as in Wikipedia, as then I'd expect both to be of even length, as it would be alternating between the count and the value (i.e., decoded t would be t = np.array([-2, 1,1,1,1])
, but s would not be clear due to uneven length).
Can you provide an example of the original time series and the encoded ones?
If I'm not mistaken -x
represents a run of x
consecutive zeros. Hence the two time series in the example decode to:
s = [1, 2, 3, 0, 1]
t = [1, 0, 0, 4, 1]
I'm facing the same issue, I can't find a run-length enconding algorithm to match the one in the paper. Regarding @MichaelSchreier's answer, I think that we can't decode it like that, in the example in the paper [1 0 0 4 1]
encodes to [1 (2) 4 1]
, therefore the negative numbers don't seem to encode the consecutive zeros.
I've came up with this solution, I haven't tested it a lot but it seems to be working:
x = [0, 0, 0, 2, 3, 0, 5, 6, 0, 0, 4, 0, 0]
def rle(series):
# convert to np array
series = np.array(series)
# add points to detect inflection on start and end
series_ = np.concatenate(([1], series, [1]))
# find zeros and non zeros
zeros = np.where(series_ == 0)[0]
nonzeros = np.where(series_ != 0)[0]
# detect sequencies
zeros = zeros[np.where(np.diff(zeros) > 1)[0]]
nonzeros = nonzeros[np.where(np.diff(nonzeros) > 1)[0]]
split = np.sort(np.concatenate([zeros, nonzeros]))
# avoid splitting on first element
split = split[split > 0]
series = np.split(series, split)
# initialize empty array
rle = []
rle = np.array(rle, dtype=int)
# encode zeros
for s in series:
if np.sum(s) == 0:
rle = np.append(rle, [len(s)], axis=0)
else:
rle = np.concatenate([rle, s])
return rle
rle(x)
[3 2 3 1 5 6 2 4 2]
After further testing, I've found some use cases where the encoding wasn't correct, so here is an updated version:
def rle(series):
"""
Run length encoding for sparse time series to encode zeros as in needed for awarp calculation
(https://ieeexplore.ieee.org/document/7837859 | https://github.com/mclmza/AWarp)
args
----
series: sparse times series (e.g. x = [0, 0, 0, 2, 3, 0, 5, 6, 0, 0, 4, 0, 0])
returns
----
array with encoded zeros (e.g. [3 2 3 1 5 6 2 4 2]) """
# convert to np array
series = np.array(series)
# add points to detect inflection on start and end
series_ = np.concatenate(([1], series, [1]))
# find zeros and non zeros
zeros = np.where(series_ == 0)[0]
if len(zeros) > 0:
nonzeros = np.where(series_ != 0)[0]
# detect zero sequencies
split_zeros = np.where(np.diff(zeros) > 1)[0] + 1
splitted_zeros = np.split(zeros, split_zeros)
zero_points = []
zero_points = np.array(zero_points, dtype=int)
for z in splitted_zeros:
zero_points = np.append(zero_points, z[-1])
# detect non-zero sequencies
nonzero_points = nonzeros[np.where(np.diff(nonzeros) > 1)[0]]
# concat all splitting points
split = np.sort(np.concatenate([zero_points, nonzero_points]))
# avoid splitting on first element
split = split[split > 0]
# separate zero sequencies from non-zero sequencies
splitted_series = np.split(series, split)
# initialize empty array
rle = []
rle = np.array(rle, dtype=int)
# encode zeros
for s in splitted_series:
# if it is a zero sequence enconde the lenght of the sequence
if np.sum(s) == 0:
rle = np.append(rle, [len(s)], axis=0)
else:
rle = np.concatenate([rle, s])
# remove zeros in the end
rle = rle[rle > 0]
return rle
else:
return series
According to the original source code in c++ where the exact same examples are given negative values indicate zero sequences
https://www.cs.unm.edu/~mueen/Projects/AWarp/
function [d, D] = AWARP(x,y)
%x and y are run-length encoded series where negative integers represent
%length of runs of zeros`