Ygg01 / xml5_draft

Draft for the XML5 proposal.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Limiting parsed numeric entities

Ygg01 opened this issue · comments

commented

Having a look at implementation of XML5 in JavaScript and this section of HTML5 spec, I added the current limit to character tokenization to just:

  • Character that are lesser than 0x10FFFF
  • Excluding characters in range: 0xD800 to 0xDFFF

But should we expand the list of restricted characters to full one used by HTML5?

 Otherwise, return a character token for the Unicode character whose code point is that number.   
 Additionally, if the number is in the range 0x0001 to 0x0008, 0x000D to 0x001F, 0x007F to 
 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 
 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE,   
 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,  
 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE,  
 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error.

They should probably work, but making them a parse error might be okay.