[BUG?] incompact seperate logic when same character
loynoir opened this issue · comments
Given
diffChars('1 aab 2','1 zzb 2')
[
{
"count": 2,
"value": "1 "
},
{
"count": 2,
"removed": true,
"value": "aa"
},
{
"count": 2,
"added": true,
"value": "zz"
},
{
"count": 3,
"value": "b 2"
}
]
Expected
diffChars('1 aab 2','1 bbb 2')
[
{
"count": 2,
"value": "1 "
},
{
"count": 2,
"removed": true,
"value": "aa"
},
{
"count": 2,
"added": true,
"value": "bb"
},
{
"count": 3,
"value": "b 2"
}
]
Actual
diffChars('1 aab 2','1 bbb 2')
[
{
"count": 2,
"value": "1 "
},
{
"count": 2,
"removed": true,
"value": "aa"
},
{
"count": 1,
"value": "b"
},
{
"count": 2,
"added": true,
"value": "bb"
},
{
"count": 2,
"value": " 2"
}
]
Additional
If it is not a bug, will be nice to have option to separate at last b, rather than first b.
Hmm. I guess the underlying intuition here is that it's better to preserve the b
that's in the same index in the string? So e.g. with diffChars('1 baa 2','1 bbb 2')
you WOULD want to preserve the first b
?
I think to get a diff that matches your intuition here you probably want to be using a diffing algorithm where edits can be substitutions, like a diff based on Levenshtein distance? If the edits you're making can be substitutions then the single optimal way to convert bbb
to aab
to is to substitute the first two b
s with a
s (which achieves the transformation with 2 edits). But to the Myers algorithm, which can only do insertions and deletions, it simply doesn't matter which of the three b
s you keep; it's the same edit distance (4, made up of 2 insertions and 2 deletions) either way.
Since jsdiff is based on the Myers diff algorithm and that's unlikely to change, I don't think there's a reasonable way for us to make jsdiff behave in the way you wanted, though.