antchfx / xmlquery

xmlquery is Golang XPath package for XML query.

Home Page:https://github.com/antchfx/xpath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OutputXML should not call EscapeText when node type is TextNode.

leavesster opened this issue · comments

I think the OutputXML function should give same string as the xml file is shows, not escape text. Escape xml Attribute is enough.

when xml is

<?xml version="1.0" encoding="utf-8"?>
	<class_list>
		<student xml:space="preserve">
			<name xml:space="default"> 
			Robert 
			</name>
			<grade>A+</grade>
		</student>
	</class_list>`

User except the node's OutputXML give the same string, not escape '\n\t' to html escape string.

and I found xml will escape these string &\n\t to html &amp;. I think may be this is way you escape these string.

I think TextNode may not need this escape behavior, only Attr string need this behavior.

If you need a pr to fix this, I can create a pr. But this may break a lot of tests.

Hello, Is this the same as #66?

Hello, Is this the same as #66?

not only whitespace,but also \t\n. these control strings will escape to &#xA;&#x9;&#x9;&#x9;

maybe we need some examples, such as this original xml:

<?xml version="1.0" encoding="utf-8"?>
<corpus xml:space="preserve">
	<p>Lorem	 
<a>ipsum</a> 
dolor</p>
</corpus>

if use node.OutputXML(false), it will output

<?xml version="1.0" encoding="utf-8"?><corpus xml:space="preserve">&#xA;&#x9;&#x9;&#x9;<p>Lorem&#x9; <a>ipsum</a> dolor</p>&#xA;&#x9;&#x9;</corpus>

maybe this function should keep these escape string \t\n as their default string, so the node.OutputXML(false) just output their original string

These characters (\t\n) do not need to be escaped, only these word below need:

name characters Unicode version
quot " U+0022 (34) XML 1.0
amp & U+0026 (38) XML 1.0
apos ' U+0027 (39) XML 1.0
lt < U+003C (60) XML 1.0
gt > U+003E (62) XML 1.0

Hello, Is this the same as #66?

It is slightly different from #66 which wants to make whitespace configurable by user.

This issue does not want to escape \t\n, just keep origin string. This behavior will make some break change (when xml has xml:space="preserve"). but I think not escape \t\n maybe is more friendly to users.