intl-messageformat-parser: Some issues I ran into migrating to v4
Janpot opened this issue · comments
Which package?
intl-messageformat-parser
Is your feature request related to a problem? Please describe.
-
My library is using
intl-messageformat-parser
. To enable xml-like syntax in v3 I did some postprocessing on the AST. Which enabled me to allow self-closing tags. The new parser doesn't allow self-closing tags. Which means I can't move to the latest version as it fails to parse input that it used to parse before. -
My library also has a mode to disable xml syntax. This is now impossible with v4 as it always attempts to parse tags.
-
It also enabled me to more clearly describe errors for invalid syntaxes. i.e. I could differentiate invalid input like follows and show clear error messages about why it's invalid
<a.b></a.b> // Error: invalid tag name <a b="c"></a> // Error: no attributes allowed <></> // Error: invalid tag name
Now all of these syntaxes throw the same general parser error which isn't very descriptive:
Expected "#", "'", "\n", "{", argumentElement, double apostrophes, end of input, or tagElement but "<" found.
While this is not really a breaking change for me, it would be nice to be able to show more descriptive errors for invalid syntax.
Describe the solution you'd like
-
Allow self-closing tags in the parser.
children
can just be an empty array, ornull
in the AST node. -
Add a mode on the parser to turn off parsing tags and treat them as normal strings.
-
More descriptive error messages on syntax errors.
Describe alternatives you've considered
Remove self-closing tags feature and string mode from my library (breaking change) or stay on v3 of the parser.
- Self-closing tags do get parsed in the parser, it's just a text node instead. In general self-closing tags are pointless because ICU already has
{var}
syntax. - We don't plan to allow any settings to tune parsing. It's a PEG parser so doing that is non-trivial.
- Same reason as above, since it's a PEG parser, when it encounters a token that doesn't fit in the schema it triggers a syntax error. Keeping context would be non-trivial.
I'm fine with not supporting self-closing tags, but you essentially just created a parser that fails on what used to be valid ICU messages. I think an option to enable/disable xml syntax is not such a crazy request.
We don't plan to allow any settings to tune parsing. It's a PEG parser so doing that is non-trivial.
Just skimmed through the docs. But it looks like options can be passed in the initializer.
The code inside the initializer can access options passed to the parser using the
options
variable.
Wouldn't it be just a matter of returning a literal node here instead of a tag node when an option xml: false
is set? It's not trivial I guess, but doesn't seem crazy to me neither.
I'm looking into trying it out but running into weird line ending issues. Are you guys using windows exclusively on this project or something?
As I mentioned we don't plan to tweak this behavior. This powers our linter as well. Tag support is a popular ask and will be in the next standard of MessageFormat that we're working on.