Shoobx / xmldiff

A library and command line utility for diffing xml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Redundant move nodes

pieterhartel opened this issue · comments

xmldiff produces redundant Move nodes. See the example below.

A solution would be nice, or a workaround; perhaps an option that makes Move actions less likely to appear?

$ cat one.html 
<html>
  <p class="date">9 Apr 2021 17:54</p>
  <p class="date">8 Apr 2021 00:08</p>
  <p class="date">3 Apr 2021 16:53</p>
  <p class="date">2 Apr 2021 02:11</p>
</html>
$ cat two.html 
<html>
  <p class="date">15 Apr 2021 17:55</p>
  <p class="date">14 Apr 2021 00:09</p>
  <p class="date">9 Apr 2021 16:54</p>
  <p class="date">8 Apr 2021 02:12</p>
</html>
$ xmldiff one.html  two.html  
[move, /html/p[1], /html[1], 2]
[move, /html/p[1], /html[1], 1]
[update-text, /html/p[1], "15 Apr 2021 17:55"]
[update-text, /html/p[2], "14 Apr 2021 00:09"]
[update-text, /html/p[3], "9 Apr 2021 16:54"]
[update-text, /html/p[4], "8 Apr 2021 02:12"]
$ xmldiff --version
xmldiff 2.4

Here is another example where changing the F parameter causes redundant Insert and Delete actions to disappear.
I'm not sure whether this is a safe workaround. In any case it does not help the case in my previous comment about the redundant Move actions.

$ cat three.html 
<html>
<span style="font-size: 20px; color:#ff0000;"><b>1QjPz9W8A3f3U6FQTQqfZMJLCCvu1KtZBL</b></span>
</html>
$ cat four.html 
<html>
<span style="font-size: 20px; color:#ff0000;"><b>33T9oxatkFfzzNDZoWJAe7URzmLwFJmttm</b></span>
</html>
$ xmldiff -F 0.5 three.html four.html 
[insert, /html/span[1], b, 0]
[update-text, /html/span/b[1], "33T9oxatkFfzzNDZoWJAe7URzmLwFJmttm"]
[delete, /html/span/b[2]]
$ xmldiff -F 0.1 three.html four.html 
[update-text, /html/span/b[1], "33T9oxatkFfzzNDZoWJAe7URzmLwFJmttm"]

In this case xmldiff thinks that the first paragraph, <p class="date">9 Apr 2021 17:54</p> is very similar to the third paragraph in the other file, and it is. <p class="date">9 Apr 2021 16:54</p> It therefore moves it to the new place.
Then it also needs to update the text, so it does that as well.

Sure, that means the move isn't needed, since all the texts have changed, but figuring that out isn't easy, you basically have to apply the changes and realize some of them are not needed.

So this output is expected.