AngleSharp / AngleSharp.Diffing

A library that makes it possible to compare two AngleSharp node lists and get a list of differences between them.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different order in style tag is seen as a difference

ortrails opened this issue · comments

Hey guys! I've been doing an html comparision of documents which only differ in the order of items in various style tags. I thought that the default diff options should prevent a difference from being detected in that situation. The difference looks like this in one html doc:
<td style="width:25%;background-color:#C0C0C0">Location</td>

...and this in the other
<td style="background-color:#C0C0C0;width:25%">Location</td>

Is there an option I can turn on to prevent this from being identified?
I suspect that the problem may be related to dropping the trailing semicolon after the final item in the style. The library I am using to output HTML does this, perhaps to reduce file size a bit.

UPDATE: Actually the behavior I'm seeing is inconsistent results in my test. Sometimes I get 5 diffs, other times 4, other times 0.

Thanks!

hi @ortrails

The order of the style definition can matter, e.g. these two will look visually different in the browser:

<h1 style="color:blue;text-align:center;color:red;">This is a header</h1>
<h1 style="color:red;text-align:center;color:blue;">This is a header</h1>

Trailing semi colons are not a problem, they are stripped if present when comparing.

If you want to ignore the style attribute when comparing, then you can use diff modifier :ignore to skip it during testing. In your control markup, add a style:ignore and you are good to go (https://github.com/AngleSharp/AngleSharp.Diffing/wiki/Diffing-Options#ignore-attributes-during-diffing).

The question is, whether or not it is actually safe to ignore the order of the individual style definitions in a style attribute, if the same style definition is not defined twice (e.g. where the latter overrides the first).

@FlorianRappl you are much more of an HTML5 spec expert than me, can you perhaps answer this?

What we could do is to do a stable sort of the style information by their name before comparing, e.g.:

sorting: 
"color:blue; text-align:center; color:red;" to
"color:blue; color:red; text-align:center;"

and

"color:red; text-align:center; color:blue;" to
"color:red; color:blue; text-align:center;"

Then the test would still fail if there are two style definitions of the same type in the style attribute but with different values, but it would pass the test when there are no duplicates and their individual values are the same.

Yeah sorting there is stable. The exception is vendor specific prefixes. Also if the same prop is used twice the order matters (usually this is to first have a legacy def and then a modern def, do that the modern one overrides the legacy and is otherwise ignored).

I'll think about some logic here that should reflect that.

Thanks for the help guys. Much appreciated.

@ortrails, let me keep this open and around. If @FlorianRappl has a good way for us add support for this in a safe way, then its a good addition.

I would also like to have this feature. We are porting mjml to .net core and it would be great to have this feature to make the port easier. Sometimes they set dozens of styles and to keep easier track of all the changes we sort them alphabetically. I have seen a single case where this could cause a problem in mjml.

Since we cannot guarantee that this will always work, my suggestion is instead that we just create a customized StyleAttributeComparer.cs that does what you need. Then, when you want to do a comparison, simply register the custom style attribute comparer and then you have that feature.

@SebastianStehle / @ortrails, if you want to submit a PR with the custom OrderedStyleAttributeComparer (just a suggestion for a name) or simply want to build one and share it here that would be great.