Unknown unicode characters in inline HTML cause syntax errors
imuli opened this issue · comments
When parsing
hi bye
I get
==> plane_15.php
syntax error: unexpected $unk at line 1
| [*node.Root]
| "Position": Pos{Line: 1-1 Pos: 1-12};
| "Stmts":
| [*stmt.InlineHtml]
| "Position": Pos{Line: 1-1 Pos: 1-3};
| "Value": hi ;
| [*stmt.InlineHtml]
| "Position": Pos{Line: 1-1 Pos: 9-12};
| "Value": bye
;
rather than
==> plane_15.php
| [*node.Root]
| "Position": Pos{Line: 1-1 Pos: 1-12};
| "Stmts":
| [*stmt.InlineHtml]
| "Position": Pos{Line: 1-1 Pos: 1-12};
| "Value": hi bye
;
The character in there is U+F0004, in Supplemental Private Use Area-A, commonly used with custom fonts for rendering charactcer like things in text on the web.
I'll submit a pull request with the fix, which simply seperates EOF from other uncategorized characters in the classifier.