Limiting parsed numeric entities

Question

Limiting parsed numeric entities

Ygg01 opened this issue 9 years ago · comments

Having a look at implementation of XML5 in JavaScript and this section of HTML5 spec, I added the current limit to character tokenization to just:

Character that are lesser than 0x10FFFF
Excluding characters in range: 0xD800 to 0xDFFF

But should we expand the list of restricted characters to full one used by HTML5?

 Otherwise, return a character token for the Unicode character whose code point is that number.   
 Additionally, if the number is in the range 0x0001 to 0x0008, 0x000D to 0x001F, 0x007F to 
 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 
 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE,   
 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,  
 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE,  
 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error.

Anne van Kesteren · Answer 1 · Sat Apr 11 2015 14:39:27 GMT+0800 (China Standard Time)

They should probably work, but making them a parse error might be okay.