XML attribute value with ">" breaks syntax highlighting
boghyon opened this issue · comments
This issue is similar to #339, but this time it's about >
instead of <
.
- According to the XML specification, the left angle bracket (
<
) MUST be escaped. (no problem) - The right angle bracket (
>
), however, doesn't need to be.The right angle bracket (>) may be represented using the string " > ", and MUST, for compatibility, be escaped
Borrowing @amroamroamro's example, you can see here that this document is valid
<?xml version="1.0"?>
<Person AgeCategory=">3" ></Person>
Using prettify, the highlighting gets unfortunately broken.
Source: OpenUI5 Walkthrough
You are right, <Person AgeCategory=">3" ></Person>
is valid XML/HTML.
So this is a bug.
FYI, here's the part that handles HTML/XML markup:
And the offending regular expression that matches tags is this one:
['lang-in.tag', /^(<\/?[a-z][^<>]*>)/i]
The pattern captured by this is then forwarded to the 'lang-in.tag' handler which in turn executes on the parts inside, to decorates the tokens inside by its own rules like:
[PR_ATTRIB_VALUE, /^(?:\"[^\"]*\"?|\'[^\']*\'?)/, null, '\"\'']
[PR_TAG, /^^<\/?[a-z](?:[\w.:-]*\w)?|\/?>$/i]
[PR_ATTRIB_NAME, /^(?!style[\s=]|on)[a-z](?:[\w:-]*\w)?/i]
Given the first regexp above /^(<\/?[a-z][^<>]*>)/i
, you can see how it would correctly match something like <tag name="val">
, but breaks for something like <tag name=">val">
:
Hence why you must escape <
and >
inside attribute values, really for code-prettify's sake, not the W3C specs :)
I feel like this should be mentioned in a FAQ somewhere; code-prettify does not implement a full-blown parser, it simply attempts to do syntax highlighting using regular expressions. I say "attempt" because it cannot correctly highlight every piece of code using only regexps. But on the web and for the purpose of presenting snippets of code, small highlighting errors are usually acceptable given the speed and small-size gains compared to implementing a full parser for every language supported.