Limiting parsed numeric entities
Ygg01 opened this issue · comments
Ygg01 commented
Having a look at implementation of XML5 in JavaScript and this section of HTML5 spec, I added the current limit to character tokenization to just:
- Character that are lesser than 0x10FFFF
- Excluding characters in range: 0xD800 to 0xDFFF
But should we expand the list of restricted characters to full one used by HTML5?
Otherwise, return a character token for the Unicode character whose code point is that number.
Additionally, if the number is in the range 0x0001 to 0x0008, 0x000D to 0x001F, 0x007F to
0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF,
0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE,
0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,
0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE,
0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error.
Anne van Kesteren commented
They should probably work, but making them a parse error might be okay.