Shoobx / xmldiff

A library and command line utility for diffing xml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How best to return and view actual diffs?

arkadianriver opened this issue · comments

The doc says that if the patch listing doesn't successfully recreate the doc it's a bug, but it also says there's no guarantee on consistently arriving there the same way each time. How can I best use the resulting listing for comparison? I found when I removed a leaf node, and compared before and after, instead of a simple DeleteNode, the patch listing first moved a second level node in an unexpected way, causing the need to rebuild the entire tree with moves, inserts, and updates all over the place. It recreated the doc beautifully, but what can I do with that edit list to be able to easily see what the differences are between the before and after?

My goal is to capture when text nodes are updated or attributes are changed, and to insert an attribute to indicate the change in the result.. a revised="yes" of sorts. With the behavior I described it doesn't seem that's possible because if the entire tree might be recreated, searching for UpdateTextIns will yield far more updates than necessary.

Thanks for any help.

It's a common problem when making test data that your nodes are too similar to each other, and that xmldiff can't tell the difference between you moving a node or changing it. Real XML data is usually more complex and doesn't have that problem as much. Trying out different modes to see what best fits your data is helpful there.

The xmldiff xml-formatter does something seminar to what you want, it adds information to the leaves on the actions, like diff:rename="br" showing that this node was called "br" in the first file, and diff:insert="" if it's new and diff:delete="" if it was removed.

Sorry for not getting back to you sooner. Not sure what you meant by test data, I was diffing between an old and new version of a file that was edited in a way an author might edit it. I think I just deleted a paragraph from a section.

Anyway, I took a look at the xml-formatter output and its results are indeed more promising than using the patch listing. It marked changes only in those areas where there actually were changes. Except, I did find, however, that it deleted a node and re-added it. But maybe I can try to detect those instances somehow and see if I can ignore them.

e.g.

In the context of the entire file diff, the xml-formatted result was this. Notice that the <myValue> node is the same but was deleted and re-inserted. I double-checked and the whitespace is the same in both files.

            <myItem rev="v1.1" diff:add-attr="rev">
                <myCrit>
                    <p>Lorem ipsum</p>
                </myCrit>
                <myValue diff:delete="">&gt; <myLabel
                        keyref="KEY_TWO" />
                </myValue>
                <myValue diff:insert="">&gt; <myLabel
                        keyref="KEY_TWO" diff:insert="" />
                </myValue>
            </myItem>

But, if I were to pull out just that section with only the added rev, the xml-formatted diff works just fine.

f1.xml

<myItem>
  <myCrit>
      <p>Lorem ipsum</p>
  </myCrit>
  <myValue>&gt; <myLabel
          keyref="KEY_TWO" />
  </myValue>
</myItem>

f2.xml

<myItem rev="v1.1">
  <myCrit>
      <p>Lorem ipsum</p>
  </myCrit>
  <myValue>&gt; <myLabel
          keyref="KEY_TWO" />
  </myValue>
</myItem>

result

<myItem xmlns:diff="http://namespaces.shoobx.com/diff" rev="v1.1" diff:add-attr="rev">
<myCrit><p>Lorem ipsum</p></myCrit><myValue>&gt; <myLabel keyref="KEY_TWO"/>
  </myValue></myItem>