The HtmlDiffer comparison does not work well with
GoogleCodeExporter opened this issue · comments
What steps will reproduce the problem?
1. Use programmatic comparison using the HtmlDiffer API.
2. The Strings in comparison have spacing characters i.e.
3. The output result replaces   with a special character - > á
What is the expected output? What do you see instead?
The output result replaces   with a special character - > á
It should consider as just a space and display it properly ( )
I basically used the Unit test code for comparison HtmlTestFixture.java
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by dominic....@gmail.com
on 2 Aug 2010 at 4:32
I tried to reproduce this with no success. Can you post the exact strings you
are trying to diff with HtmlTestFixture.java ?
Also you could always do some pre-preprocessing before passing the input
strings to DaisyDiff. I am actually using : input = input.replaceAll(" "," ");
in production code. Maybe this might solve your problem as well.
Original comment by kkape...@gmail.com
on 10 Aug 2010 at 3:20
- Added labels: ****
- Removed labels: ****
The code that I am using is:
HtmlTestFixture d = new HtmlTestFixture();
String one = "<p>Style sheets represent a major breakthrough for \n Web page designers,expanding their ability to improve the appearance of their pages. </p>";
String two = "<p>Style sheets represent a major breakthrough for Web page designers,expanding their ability to improve the appedfarance oops i am new of their . </p>";
String result = d.diff(one, two);
System.out.println(result);
And the output I get is:
<?xml version="1.0" encoding="UTF-8"?><p>Style sheets represent a major
breakthrough forááá Web page designers,expanding their ability to improve
the <span class="diff-html-removed" id="removed-diff-0" previous="first-diff"
changeId="removed-diff-0" next="added-diff-0">appearance </span><span
class="diff-html-added" id="added-diff-0" previous="removed-diff-0"
changeId="added-diff-0" next="removed-diff-1">appedfarance oops i am new
</span>of their <span class="diff-html-removed" id="removed-diff-1"
previous="added-diff-0" changeId="removed-diff-1" next="last-diff">pages</span>
. </p>
which is almost perfect except for the á characters instead of
input = input.replaceAll(" "," "); will not solve the problem as you will lose
the data about how much space is present between two words or sections unless
the text is between quotes.
Original comment by dominic....@gmail.com
on 11 Aug 2010 at 5:50
- Added labels: ****
- Removed labels: ****
3 points.
1. I tried your example with HtmlTestFixture and got normal spaces (not nsbp
but not strange characters either).
2. The HtmlTestFixture is very simple (just for unit tests). For production
quality code I would advise you to look at the main method that performs
several other cleanups. Normal DaisyDiff does exactly what you want (see
attached screenshot)
3. Can you clarify what data is lost by the "replaceAll" method? In your
example if I run this method then I still have the information that 3 spaces
exist before newline. What data is lost? What is the difference if the text
is in quotes or not?
Original comment by kkape...@gmail.com
on 16 Aug 2010 at 1:07
- Added labels: ****
- Removed labels: ****
Attachments:
I really dont understand how this is working at you end..could be a JVM issue?
May be I could try some other code as you suggested..
What I meant by you cant use input.replaceAll(" "," ") can be explained by
viewing the below code in a browser.
<p>hello how are you</p>
<p>hello how are you</p>
The output will be the same.
Original comment by dominic....@gmail.com
on 16 Aug 2010 at 3:56
- Added labels: ****
- Removed labels: ****
I had the same issue with the
In my case, htmldiff was replacing the correctly to ' ', in UTF-8. On the
other hand, my browser was configured to char encoding != UTF-8.
Solution: configure your browser char encoding to UTF-8.
Original comment by mcdoct...@gmail.com
on 19 Nov 2010 at 7:52
- Added labels: ****
- Removed labels: ****
dominic, can you check your browser settings?
Maybe what mcdoctore is suggesting is a solution?
Original comment by kkape...@gmail.com
on 19 Nov 2010 at 3:19
- Added labels: ****
- Removed labels: ****
It is working now..Thanks
Original comment by dominic....@gmail.com
on 19 Nov 2010 at 4:30
- Added labels: ****
- Removed labels: ****
Closed since it was apparently a browser issue.
Original comment by kkape...@gmail.com
on 20 Nov 2010 at 10:51
- Changed state: Done
- Added labels: ****
- Removed labels: ****