Dana-Ferguson / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

diff_cleanupSemantic sometimes produces much worse results (visually)

GoogleCodeExporter opened this issue · comments

What steps will reproduce the problem?
Call diff_cleanupSemantic with the results of diff_main.

What is the expected output? What do you see instead?
I expect the output to be at least as visually ‘clean’ as with no 
diff_cleanupSemantic call.
Sometimes the output is much worse after the diff_cleanupSemantic call.

What version of the product are you using? On what operating system?
JavaScript version, r103.

Please provide any additional information below.

I’ve been using DMP for a few years now to display diffs to text files before 
svn commits.  (Thank you.)
I sometimes see results that don’t look as I would expect (or like).
I’ve finally tracked it down to a simple example.
I always call diff_cleanupSemantic after diff_main as the final output is for 
user display.
The outputs that result from this appear to vary 
unexpectedly/uncontrollably/randomly.

The following small example produces a poor display after diff_cleanupSemantic, 
but a slightly better display if each tab is replaced with 2 spaces, better 
still with 6 spaces.  However, if you replace all occurrences of 'comment' with 
'ABC' then the output is barely changed after the diff_cleanupSemantic calls in 
each case.

T1='BEGIN\n\tUInt16\tflaps;\n\taabb\tfat_mood:1;\n\txxyy\tisthing:1;\n\tccdd\ttr
ee_climb:1;\n\tUInt8\tflick:5;\nEND';
T2='BEGIN\n\tUInt16\tflaps;\t\t// first comment\n\taabb\tfat_mood:1;\t\t// 
second comment\n\txxyy\tisthing:1;'+
    '\t// third comment\n\tccdd\ttree_climb:1;\t// forth comment\n\tUInt8\tflick:3;\t// fifth comment\nEND';
diffs=DMP.diff_main(T1, T2)

diffs =>
    BEGIN¬
        UInt16  flaps;<ins>     // first comment</ins>¬
        aabb    fat_mood:1;<ins>        // second comment</ins>¬
        xxyy    isthing:1;<ins> // thrid comment</ins>¬
        ccdd    tree_climb:1;<ins>  // forth comment</ins>¬
        UInt8   flick:<del>5</del><ins>3</ins>;<ins>    // fifth comment</ins>¬
    END

diff_cleanupSemantic(diffs) =>
    BEGIN¬
        UInt16  flaps;<del>¬
        aabb    fat_mood:1;¬
        xxyy    isthing:1;¬
        ccdd    tree_climb:1;¬
        UInt8   flick:5;</del><ins>     // first comment¬
        aabb    fat_mood:1;     // second comment¬
        xxyy    isthing:1;  // thrid comment¬
        ccdd    tree_climb:1;   // forth comment¬
        UInt8   flick:3;    // fifth comment</ins>¬
    END

While the second has less elements than the first it cannot be said to meet the 
intent to “Increase human readability by factoring out commonalities which 
are likely to be coincidental”.

The attached HTML file, when dropped in the diff_match_patch/javascript folder 
& opened in a browser, fully illustrates the problem.

Original issue reported on code.google.com by chris...@gmail.com on 30 Mar 2012 at 2:46

Attachments:

I've seen the same problem, often if you have taken single line function call, 
and broken it across multiple lines for a change, the raw output always only 
shows insertions of newlines, but after semantic cleanup we have larger 
deletions and insertions, and what it determines to be part of the those 
deletions and insertions is quite variable depending on the level of 
indentation/spaces, as noted in the bug above.

For example:

Before Text:
    EXPECT_SUCCESS(SM::getMV(noSubject, S::Builder().set_values(values).finalize(), dummy, results));

After Text:
    EXPECT_SUCCESS(SM::getRMV(noSubject,
                   S::Builder().set_values(values).finalize(),
                   dummy,
                   results));

Diff and then semantic cleanup:
EXPECT_SUCCESS(SM::getRMV(noSubject,<ins>¶</ins>
S::Builder().set_values(values).finalize(), <del>dummy,</del><ins>¶
dummy,¶</ins>
results));

Original comment by d...@twkie.net on 27 Nov 2012 at 11:57