Shoobx / xmldiff

A library and command line utility for diffing xml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

word diffing instead of character

jennnson opened this issue · comments

Hello xmldiff community,

I am currently working on a project with this library, and I was wondering if one of you came across a solution to the following problem:
xmldiff can compare two files and their text in tags alphabetically. I would like to compare the text not alpabetically, but in words.

For example, the file1 has
<title>title is dummy</title>

The file2 has
<title>title is yummy</title>

The diffing output would be
<title>title is <delete>d</delete><insert>y</insert>ummy</title>

I would like to have the result as following:
<title>title is <delete>dummy</delete><insert>yummy</insert></title>

There is probably an easy solution for this problem, I just cannot seem to find it. Any ideas how to do it?

Thank you so much in advance.

Well, I don't know about easy, but I think the easiest would be a custom Formatter class, that overrides the _make_diff_tags() function and splits the text on words before inserting tags.

I don't know if diff_match_patch() supports anything else than text, but the Python standard library difflib does, so that might work.