jtmoulia / neotomex

A PEG parser/transformer with a pleasant Elixir DSL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unicode inputs are not parsable with exclusion character sets

brentspell opened this issue · comments

defmodule Unparsable do
  use Neotomex.ExGrammar

  @root true

  define :root, "[^a]*"
end

Unparsable.parse("pɪˈkɑːn")

This code raises the following error:

** (MatchError) no match of right hand side value: {"ɪ", "ˈkɑːn"}
    (neotomex) lib/neotomex/grammar.ex:325: Neotomex.Grammar.match/3
    (neotomex) lib/neotomex/grammar.ex:442: Neotomex.Grammar.match_zero_or_more/4
    (neotomex) lib/neotomex/grammar.ex:157: Neotomex.Grammar.parse/2
    /Users/brent/tmp/test.exs:1: Unparsable.parse/1
    (elixir) lib/code.ex:813: Code.require_file/2
    (mix) lib/mix/tasks/run.ex:145: Mix.Tasks.Run.run/5
    (mix) lib/mix/tasks/run.ex:85: Mix.Tasks.Run.run/1
    (mix) lib/mix/task.ex:331: Mix.Task.run_task/3
    (mix) lib/mix/cli.ex:79: Mix.CLI.run_task/2
    (elixir) lib/code.ex:813: Code.require_file/2

It appears that this can be fixed by passing "u" to Regex.compile in peg.ex.

Hi @brentspell -- thanks for reporting this along with the fix!

By any chance would you like to open a PR with the fix and a test? If so, I'll be much more responsive with getting it merged. Otherwise, I can put the diff together -- it's definitely a useful addition.

I ended up going with nimble-parsec for this parser, but if I come across another neotomex use case, I'll do a pr for this.

Makes sense -- thanks!