concordancejs / concordance

Compare, format, diff and serialize any JavaScript value

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cleaner diffs for inserted array items

lukechilds opened this issue · comments

Isolated test case as requested in avajs/ava#1521

If you append an item to an array and diff:

var concordance = require('concordance');
const orig = [
  'foo',
  'bar',
  'fizz',
  'buzz'
];
const copy = Array.from(orig);
copy.push('hi')
concordance.diff(orig, copy);

You get neat output:

  [
    'foo',
    'bar',
    'fizz',
    'buzz',
+   'hi',
  ]

However if you insert anywhere else and diff:

var concordance = require('concordance');
const orig = [
    'foo',
    'bar',
    'fizz',
    'buzz'
];
const copy = Array.from(orig);
copy.unshift('hi')
concordance.diff(orig, copy);

The output is quite hard to follow after the inserted item:

  [
-   'foo',
+   'hi',
-   'bar',
+   'foo',
-   'fizz',
+   'bar',
-   'buzz',
+   'fizz',
+   'buzz',
  ]

Repro with git producing clean results:

$ cat orig
[
    'foo',
    'bar',
    'fizz',
    'buzz'
];

$ cat copy
[
    'hi',
    'foo',
    'bar',
    'fizz',
    'buzz'
];

$ git diff orig copy
diff --git a/orig b/copy
index 5d484f0..f4066df 100644
--- a/orig
+++ b/copy
@@ -1,4 +1,5 @@
 [
+    'hi',
     'foo',
     'bar',
     'fizz',

I looked into this a bit, out of curiosity. This issue is specific to arrays of strings; it only occurs when string items of two arrays get diffed against eachother. What's going on is that the first pair of strings ('foo', 'hi') are diffed (producing - 'foo', + 'hi') and then the next pair of strings are diffed ('bar', 'foo') (producing - 'bar', + 'foo') and so on.

The issue is caused by the fact that string descriptors provide diffDeep(), and PrimitiveItem descriptors' prepareDiff() method (which would ordinarily be responsible for lining up the items of the two arrays) short-circuits when its value's descriptor provides diffDeep().

Simply removing that short-circuiting step solves this issue, but also prevents string items from ever being deeply diffed against eachother. E.g., the original example correctly produces

  [
+   'hi',
    'foo',
    'bar',
    'fizz',
    'buzz',
  ]

but diffing two arrays of similar strings against eachother produces e.g.

diff(['foo\nbar\nfizz\nbuzz'], ['hi\nfoo\nbar\nfizz\nbuzz'])

  [
-   `foo␊
-   bar␊
-   fizz␊
-   buzz`,
+   `hi␊
+   foo␊
+   bar␊
+   fizz␊
+   buzz`,
  ]

instead of the current result,

  [
+   hi␊
    `foo␊
    bar␊
    fizz␊
    buzz`,
  ]

(which is incidentally itself a little bugged, the opening backtick is on the wrong line).

I'm not sure there's a perfect solution here, at least with the current greedy diff algorithm; without an edit-script-minimizing algorithm, (edited: I don't understand the algorithm as well as I thought, sorry) it doesn't seem obvious whether an item should be treated as inserted or modified.