Run-length encoding example

Question

Run-length encoding example

marcus-voss opened this issue 5 years ago · comments

Thanks for the implementation. I'd be interested to use it in a project on sparse binary sequences. However, I'm not yet sure about your expected format. You provide these two arrays here as examples:

s = np.array([1, 2, 3, -1, 1])
t = np.array([1, -2, 4, 1])

However, I don't get how these match either the run-length encoding of the paper (then I'd expect numbers highlighted somehow to give an idea of the number of zeros) nor the encoding described as in Wikipedia, as then I'd expect both to be of even length, as it would be alternating between the count and the value (i.e., decoded t would be t = np.array([-2, 1,1,1,1]), but s would not be clear due to uneven length).

Can you provide an example of the original time series and the encoded ones?

MichaelSchreier · Answer 1 · Fri Mar 13 2020 21:59:25 GMT+0800 (China Standard Time)

If I'm not mistaken -x represents a run of x consecutive zeros. Hence the two time series in the example decode to:

s = [1, 2, 3, 0, 1]
t = [1, 0, 0, 4, 1]

Tiago Santos · Answer 2 · Tue Jul 07 2020 19:01:49 GMT+0800 (China Standard Time)

I'm facing the same issue, I can't find a run-length enconding algorithm to match the one in the paper. Regarding @MichaelSchreier's answer, I think that we can't decode it like that, in the example in the paper [1 0 0 4 1] encodes to [1 (2) 4 1], therefore the negative numbers don't seem to encode the consecutive zeros.

Tiago Santos · Answer 3 · Tue Jul 07 2020 20:23:07 GMT+0800 (China Standard Time)

I've came up with this solution, I haven't tested it a lot but it seems to be working:

x = [0, 0, 0, 2, 3, 0, 5, 6, 0, 0, 4, 0, 0]
def rle(series):
    # convert to np array
    series = np.array(series)

    # add points to detect inflection on start and end
    series_ = np.concatenate(([1], series, [1]))

    # find zeros and non zeros
    zeros = np.where(series_ == 0)[0] 
    nonzeros = np.where(series_ != 0)[0]

    # detect sequencies
    zeros = zeros[np.where(np.diff(zeros) > 1)[0]]
    nonzeros = nonzeros[np.where(np.diff(nonzeros) > 1)[0]]
    
    split = np.sort(np.concatenate([zeros, nonzeros]))

    # avoid splitting on first element
    split = split[split > 0]

    series = np.split(series, split)

    # initialize empty array
    rle = []
    rle = np.array(rle, dtype=int)

    # encode zeros
    for s in series:
        if np.sum(s) == 0:
            rle = np.append(rle, [len(s)], axis=0)
        else:
            rle = np.concatenate([rle, s])    

    return rle

rle(x)
[3 2 3 1 5 6 2 4 2]

Tiago Santos · Answer 4 · Wed Jul 08 2020 19:45:59 GMT+0800 (China Standard Time)

After further testing, I've found some use cases where the encoding wasn't correct, so here is an updated version:

def rle(series):
    """ 
    Run length encoding for sparse time series to encode zeros as in needed for awarp calculation 
    (https://ieeexplore.ieee.org/document/7837859 | https://github.com/mclmza/AWarp)
    
    args
    ----
    series: sparse times series (e.g. x = [0, 0, 0, 2, 3, 0, 5, 6, 0, 0, 4, 0, 0])

    returns
    ---- 
    array with encoded zeros (e.g. [3 2 3 1 5 6 2 4 2]) """

    # convert to np array
    series = np.array(series)

    # add points to detect inflection on start and end
    series_ = np.concatenate(([1], series, [1]))

    # find zeros and non zeros
    zeros = np.where(series_ == 0)[0]

    if len(zeros) > 0:
        nonzeros = np.where(series_ != 0)[0]

        # detect zero sequencies
        split_zeros = np.where(np.diff(zeros) > 1)[0] + 1

        splitted_zeros = np.split(zeros, split_zeros)

        zero_points = []
        zero_points = np.array(zero_points, dtype=int)
        
        for z in splitted_zeros:
            zero_points = np.append(zero_points, z[-1])

        # detect non-zero sequencies
        nonzero_points = nonzeros[np.where(np.diff(nonzeros) > 1)[0]]

        # concat all splitting points
        split = np.sort(np.concatenate([zero_points, nonzero_points]))

        # avoid splitting on first element
        split = split[split > 0]

        # separate zero sequencies from non-zero sequencies
        splitted_series = np.split(series, split)

        # initialize empty array
        rle = []
        rle = np.array(rle, dtype=int)

        # encode zeros
        for s in splitted_series:
            # if it is a zero sequence enconde the lenght of the sequence
            if np.sum(s) == 0:
                rle = np.append(rle, [len(s)], axis=0)
            else:
                rle = np.concatenate([rle, s])

        # remove zeros in the end
        rle = rle[rle > 0]

        return rle
    else:
        return series

Kilian Brachtendorf · Answer 5 · Fri Aug 06 2021 18:49:20 GMT+0800 (China Standard Time)

According to the original source code in c++ where the exact same examples are given negative values indicate zero sequences

https://www.cs.unm.edu/~mueen/Projects/AWarp/

function [d, D] = AWARP(x,y)
%x and y are run-length encoded series where negative integers represent
%length of runs of zeros`