Shoobx / xmldiff

A library and command line utility for diffing xml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A diff format for humans

regebro opened this issue · comments

xmldiffhas since the start focused on making diffs that are used by computers to do stuff. xmldiff 2.0added formatters to make XML output that you can XSLT transform into GUI outputs with colors or similar. But that both requires extra coding depending on your format, and it also obviously does not work on a command line.

If people have ideas for how a command-line diff output for humans would look, discuss that here.

I think something like the git diff output could be a start, i.e.

diff --git a/CHANGES.rst b/CHANGES.rst
index 56e66c8..b56bebd 100644
--- a/CHANGES.rst
+++ b/CHANGES.rst
@@ -1,6 +1,12 @@
 Changes
 =======
 
+2.5 (unreleased)
+----------------
+
+- Nothing changed yet.
+
+
 2.4 (2019-10-09)
 ----------------
 
diff --git a/setup.py b/setup.py
index 9c9c79c..5583987 100644
--- a/setup.py
+++ b/setup.py
@@ -1,7 +1,7 @@
 from io import open
 from setuptools import setup, find_packages
 
-version = '2.4'
+version = '2.5.dev0'
 
 with open('README.rst', 'rt', encoding='utf8') as readme:
     description = readme.read()

(I just grabbed the diff for the latest commit to master here)

The parts I'm thinking of as most important are like the version line.
It has the change split out on two lines, making it easier to read.

In git diff the line @@ -1,7 +1,7 @@ is to indicate the location in the file, for context.
Perhaps xmldiff could use the tree path? @@ html>body>div.main>h1#title @@ I'm not sure the best way to handle the path, so I gave that HTML as an example.

Following a diff format, and making the syntax compatible with the current syntax, for highlighting, is the best option, in my opinion. It is a widely used and recognized syntax for view the differences of two files.

The diff format has GNU documentation here.
It also has documentation specifically on the output formats here.

I don't think a line-by-line format is a good start, as that's not at all how XML works. We need to think out of the box here.

I was more thinking node by node, with the tree path location above to indicate what changed.
The line by line was mostly to show the syntax of diff tools. Something like this:

xmldiff a/simple.xml b/simple.xml
--- a/simple.xml
+++ b/simple.xml
@@ /breakfast_menu[3]/food/name @@
- Berry-Berry Belgian Waffles
+ Very-Berry Belgian Waffles
@@ /breakfast_menu[5]/food/description @@
- Two eggs, bacon or sausage, toast, and our ever-popular hash browns
+ Two eggs, bacon or sausage, toast, and our famous hash browns

I used this xml document for the nodes
I'm not sure the best way to show attribute changes, but maybe place the attribute in parentheses?

@@ /root[5]/node(attribute) @@
- old-value
+ new-value

And inserting or deleting is fairly simple, you would just omit the other:

@@ /root[2]/node @@
- node contents
@@ /root[4]/node @@
+ node contents

This would give an equivalent amount of information that the CLI already give, I believe.

I just wanted to throw my thoughts down on this.

Thanks for the feedback! A format like that is definitely an improvement on the current default format.

commented

I was looking for a way to see old and new values (and ignore all the noise introduced by the xml generator).
xmldiff by default only shows the new value for changed attributes, but not the old value.
The above would be a great feature.

A quick and dirty way that showed what I was looking for was:
xmldiff -f xml -p a/file.xml b/file.xml | grep diff

Maybe it would also work if this
xmldiff -f diff a/file b/file
would output
[update-attribute, /module/views/layers[1], image, "old_value", "new_value"]
instead of
[update-attribute, /module/views/layers[1], image, "new_value"]