Maghoumi / pytorch-softdtw-cuda

Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distance function and comparison with no warping

cfrancesco opened this issue · comments

Hi,
I am trying to use a L1 distance instead of L2.
Currently I only change line 321 in soft_dtw_cuda.py:

    def _calc_distance_matrix(self, x, y):
        """
        Calculates the Euclidean distance between each element in x and y per timestep
        """
        n = x.size(1)
        m = y.size(1)
        d = x.size(2)
        x = x.unsqueeze(2).expand(-1, n, m, d)
        y = y.unsqueeze(1).expand(-1, n, m, d)
        return torch.abs(x - y).sum(3)
#        return torch.pow(x - y, 2).sum(3)

is this correct or is there more to be changed?

Also, I'm comparing the non warped loss (i.e. mean(abs(X-Y))) and the dtw loss is much higher.
I would expect dtw to roughly less or equal (averaging the timesteps) of the normal L1, or at least comparable, but I get value which are of 2 or even 3 orders of magnitude higher. What am I missing?

Your change to the cost function looks correct to me.

I'm comparing the non warped loss (i.e. mean(abs(X-Y))) and the dtw loss is much higher.

Can you elaborate a bit more what you mean by that?

Given two sequence A, B (of length n), I compute the standard L1 loss as abs(A-B)/n.
Now, by warping I would expect a lower average timestep distance: assuming I expand the two series into A' and B' (as in the many to many mapping in dtw) of lenght n' > n, I would expect that abs(A' - B')/n' < abs(A-B)/n and dtw(A,B) to be closely related toabs(A' - B')/n'. I find however that dtw(A,B)~~~ L1(A,B) ^ 2 (~~~ meaning very loosely, with very noisy sequences).
Basically I wanted to compute the "degree of warping" as 1 - dtw(A,B) / L1(A,B) which, assuming that dtw(A,B) < L1(A,B), would be between 0 and 1. However this seems far from True, even taking the square of L1(A,B).

Thanks for the clarification.

If I understood your description correctly, I think the huge difference in the result is expected, because the returned value from sDTW actually is logarithmized (see this related response from the original authors).

The global alignment kernel normalization trick that the author has referenced in his response above can help obtain sDTW values that are always greater than 0, but I don't know what else needs to be done to correctly scale the resulting sDTW value between 0 and 1, if that's what you're after in this case.