Change Core.ComparisonSource.GetPathIndex() to return the Index inside ChildNodes instead of Children
edxlhornung opened this issue · comments
New Feature Proposal
Description
The path
property inside the HtmlDiffer.compare()
Diff nodes does not work for TextNode
nodes.
Background
We are using this library to check the difference between 2 HTML elements and add style to those elements to show to users the changes between the 2. Since the nodes inside the diffs returned by HtmlDiffer
don't refer to the original HtmlDocument
passed to the compare method, I need to use the path
property to traverse the original document in order to add style (A red strikethrough) to the nodes marked by diff as MissingNodeDiff
.
The path
property works fine for all HtmlElement
nodes as it returns the correct index inside the list returned by Children. However, since TextNode are not present inside the Children
property, I must use ChildNodes
to access all of my TextNode
. In this case, the path does not return the correct index for all of my TextNode
.
Example
Here is the path returned by the diff node. I want to access the text(8) element inside the p(0) node. Accessing the p(0) element is not problem.
Here is what is returned by the Children
property inside the p(0) node. We can see that no text nodes are present.
Here is what is returned by the ChildNodes
property inside the p(0) node. We can see that there are text nodes however, although there is a TextNode
at index 8, it is not the correct node. The correct node should be at index 10 (I compared the content of the node returned by diff and the content of all nodes inside ChildNodes
).
Suggestion
I would suggest changing the current implementation of ComparisonSource.GetPathIndex() to:
private static int GetPathIndex(INode node)
{
var result = 0;
var parent = node.Parent;
if (parent is not null)
{
var childNodes = parent.ChildNodes;
for (int index = 0; index < childNodes.Length; index++)
{
if (ReferenceEquals(childNodes[index], node))
return index;
}
}
throw new InvalidOperationException("Unexpected node tree state. The node was not found in its parents child nodes collection.");
}
Also, not sure how it should be implemented, but the path
property is a little hard to work with as it requires extracting the indexes from the path string with a regex.
Thank you for the suggestion. It's been a while since I was knee deep in the lib, so cannot remember if there is a reason I did not use ChildNodes
. Maybe @FlorianRappl has some insights here.
Either way, unfortunately I am very busy ATM at work, but if you want to experiment with this change yourself and make the suggested change, the current test suite should catch any regressions, as it's rather comprehensive.
Otherwise I'll be able to look at this at a later time.
Hi,
Thanks for the quick response. I'll definitely work on that and follow with a PR.
However, I will have to change the test suite for that method as it is designed to skip elements without children (ie: textnodes and paragraphs). I'll add a couple of test cases with TextNodes.
ChildNodes
contains all nodes (incl. comments, text nodes etc.) while Children
only contains elements.
In a fixed match scenario you'd want to use ChildNodes
, but if its about equivalent then Children
and using something like InnerText
for text nodes may be better. Reason is simple: Comments etc. don't matter and 2 text nodes may contain the same content as 1 text node. Furthermore, an equal comparison of two text nodes may be false, however, their actual output may be the same (e.g., due to processing the used spacing and special characters).