OutputXML should not call EscapeText when node type is TextNode.
leavesster opened this issue · comments
I think the OutputXML function should give same string as the xml file is shows, not escape text. Escape xml Attribute is enough.
when xml is
<?xml version="1.0" encoding="utf-8"?>
<class_list>
<student xml:space="preserve">
<name xml:space="default">
Robert
</name>
<grade>A+</grade>
</student>
</class_list>`
User except the node's OutputXML give the same string, not escape '\n\t' to html escape string.
and I found xml will escape these string &\n\t
to html &
. I think may be this is way you escape these string.
I think TextNode may not need this escape behavior, only Attr string need this behavior.
If you need a pr to fix this, I can create a pr. But this may break a lot of tests.
Hello, Is this the same as #66?
not only whitespace,but also \t\n
. these control strings will escape to 
			
maybe we need some examples, such as this original xml:
<?xml version="1.0" encoding="utf-8"?>
<corpus xml:space="preserve">
<p>Lorem
<a>ipsum</a>
dolor</p>
</corpus>
if use node.OutputXML(false)
, it will output
<?xml version="1.0" encoding="utf-8"?><corpus xml:space="preserve">
			<p>Lorem	 <a>ipsum</a> dolor</p>
		</corpus>
maybe this function should keep these escape string \t\n
as their default string, so the node.OutputXML(false)
just output their original string
These characters (\t\n
) do not need to be escaped, only these word below need:
name | characters | Unicode | version |
---|---|---|---|
quot | " | U+0022 (34) | XML 1.0 |
amp | & | U+0026 (38) | XML 1.0 |
apos | ' | U+0027 (39) | XML 1.0 |
lt | < | U+003C (60) | XML 1.0 |
gt | > | U+003E (62) | XML 1.0 |
Hello, Is this the same as #66?
It is slightly different from #66 which wants to make whitespace configurable by user.
This issue does not want to escape \t\n
, just keep origin string. This behavior will make some break change (when xml has xml:space="preserve"
). but I think not escape \t\n
maybe is more friendly to users.