Cleaner diffs for inserted array items
lukechilds opened this issue · comments
Isolated test case as requested in avajs/ava#1521
If you append an item to an array and diff:
var concordance = require('concordance');
const orig = [
'foo',
'bar',
'fizz',
'buzz'
];
const copy = Array.from(orig);
copy.push('hi')
concordance.diff(orig, copy);
You get neat output:
[
'foo',
'bar',
'fizz',
'buzz',
+ 'hi',
]
However if you insert anywhere else and diff:
var concordance = require('concordance');
const orig = [
'foo',
'bar',
'fizz',
'buzz'
];
const copy = Array.from(orig);
copy.unshift('hi')
concordance.diff(orig, copy);
The output is quite hard to follow after the inserted item:
[
- 'foo',
+ 'hi',
- 'bar',
+ 'foo',
- 'fizz',
+ 'bar',
- 'buzz',
+ 'fizz',
+ 'buzz',
]
Repro with git producing clean results:
$ cat orig
[
'foo',
'bar',
'fizz',
'buzz'
];
$ cat copy
[
'hi',
'foo',
'bar',
'fizz',
'buzz'
];
$ git diff orig copy
diff --git a/orig b/copy
index 5d484f0..f4066df 100644
--- a/orig
+++ b/copy
@@ -1,4 +1,5 @@
[
+ 'hi',
'foo',
'bar',
'fizz',
I looked into this a bit, out of curiosity. This issue is specific to arrays of strings; it only occurs when string items of two arrays get diffed against eachother. What's going on is that the first pair of strings ('foo'
, 'hi'
) are diffed (producing - 'foo'
, + 'hi'
) and then the next pair of strings are diffed ('bar'
, 'foo'
) (producing - 'bar'
, + 'foo'
) and so on.
The issue is caused by the fact that string descriptors provide diffDeep()
, and PrimitiveItem
descriptors' prepareDiff()
method (which would ordinarily be responsible for lining up the items of the two arrays) short-circuits when its value's descriptor provides diffDeep()
.
Simply removing that short-circuiting step solves this issue, but also prevents string items from ever being deeply diffed against eachother. E.g., the original example correctly produces
[
+ 'hi',
'foo',
'bar',
'fizz',
'buzz',
]
but diffing two arrays of similar strings against eachother produces e.g.
diff(['foo\nbar\nfizz\nbuzz'], ['hi\nfoo\nbar\nfizz\nbuzz'])
[
- `foo␊
- bar␊
- fizz␊
- buzz`,
+ `hi␊
+ foo␊
+ bar␊
+ fizz␊
+ buzz`,
]
instead of the current result,
[
+ hi␊
`foo␊
bar␊
fizz␊
buzz`,
]
(which is incidentally itself a little bugged, the opening backtick is on the wrong line).
I'm not sure there's a perfect solution here, at least with the current greedy diff algorithm; without an edit-script-minimizing algorithm, (edited: I don't understand the algorithm as well as I thought, sorry) it doesn't seem obvious whether an item should be treated as inserted or modified.