CADbloke / daisydiff

Automatically exported from code.google.com/p/daisydiff

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The HtmlDiffer comparison does not work well with  

GoogleCodeExporter opened this issue · comments

What steps will reproduce the problem?
1. Use programmatic comparison using the HtmlDiffer API.
2. The Strings in comparison have spacing characters i.e.  
3. The output result replaces &nbsp with a special character - > á

What is the expected output? What do you see instead?

The output result replaces &nbsp with a special character - > á
It should consider   as just a space and display it properly ( )
I basically used the Unit test code for comparison HtmlTestFixture.java

What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by dominic....@gmail.com on 2 Aug 2010 at 4:32

I tried to reproduce this with no success. Can you post the exact strings you 
are trying to diff with HtmlTestFixture.java ?

Also you could always do some pre-preprocessing before passing the input 
strings to DaisyDiff. I am actually using : input = input.replaceAll(" "," "); 
in production code. Maybe this might solve your problem as well.

Original comment by kkape...@gmail.com on 10 Aug 2010 at 3:20

  • Added labels: ****
  • Removed labels: ****
The code that I am using is:

HtmlTestFixture d = new HtmlTestFixture();
            String one = "<p>Style sheets represent a major breakthrough for    \n Web page designers,expanding their ability to improve the appearance of their pages. </p>";
            String two = "<p>Style sheets represent a major breakthrough for     Web page designers,expanding their ability to improve the appedfarance oops i am new of their . </p>";
            String result = d.diff(one, two);
            System.out.println(result);

And the output I get is:

<?xml version="1.0" encoding="UTF-8"?><p>Style sheets represent a major 
breakthrough forááá Web page designers,expanding their ability to improve 
the <span class="diff-html-removed" id="removed-diff-0" previous="first-diff" 
changeId="removed-diff-0" next="added-diff-0">appearance </span><span 
class="diff-html-added" id="added-diff-0" previous="removed-diff-0" 
changeId="added-diff-0" next="removed-diff-1">appedfarance oops i am new 
</span>of their <span class="diff-html-removed" id="removed-diff-1" 
previous="added-diff-0" changeId="removed-diff-1" next="last-diff">pages</span> 
. </p>

which is almost perfect except for the á characters instead of  

input = input.replaceAll(" "," "); will not solve the problem as you will lose 
the data about how much space is present between two words or sections unless 
the text is between quotes.


Original comment by dominic....@gmail.com on 11 Aug 2010 at 5:50

  • Added labels: ****
  • Removed labels: ****
3 points.

1. I tried your example with HtmlTestFixture and got normal spaces (not nsbp 
but not strange characters either).

2. The HtmlTestFixture is very simple (just for unit tests). For production 
quality code I would advise you to look at the main method that performs 
several other cleanups. Normal DaisyDiff does exactly what you want (see 
attached screenshot)

3. Can you clarify what data is lost by the "replaceAll" method? In your 
example if I run this method then I still have the information that 3 spaces 
exist before newline.    What data is lost? What is the difference if the text 
is in quotes or not?

Original comment by kkape...@gmail.com on 16 Aug 2010 at 1:07

  • Added labels: ****
  • Removed labels: ****

Attachments:

I really dont understand how this is working at you end..could be a JVM issue?

May be I could try some other code  as you suggested..

What I meant by you cant use input.replaceAll(" "," ") can be explained by 
viewing the below code in a browser.

<p>hello how are you</p>
<p>hello      how are you</p>

The output will be the same.

Original comment by dominic....@gmail.com on 16 Aug 2010 at 3:56

  • Added labels: ****
  • Removed labels: ****
I had the same issue with the  
In my case, htmldiff was replacing the   correctly to ' ', in UTF-8. On the 
other hand, my browser was configured to char encoding != UTF-8.
Solution: configure your browser char encoding to UTF-8.

Original comment by mcdoct...@gmail.com on 19 Nov 2010 at 7:52

  • Added labels: ****
  • Removed labels: ****
dominic, can you check your browser settings?

Maybe what mcdoctore is suggesting is a solution?

Original comment by kkape...@gmail.com on 19 Nov 2010 at 3:19

  • Added labels: ****
  • Removed labels: ****
It is working now..Thanks

Original comment by dominic....@gmail.com on 19 Nov 2010 at 4:30

  • Added labels: ****
  • Removed labels: ****
Closed since it was apparently a browser issue.

Original comment by kkape...@gmail.com on 20 Nov 2010 at 10:51

  • Changed state: Done
  • Added labels: ****
  • Removed labels: ****