Problem with Repeat
xanatos opened this issue · comments
I'm trying to create an importer for the unicode NameList.txt (http://www.unicode.org/Public/UNIDATA/NamesList.html the grammar and https://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt the data). I've been able to do it using your peg 2.0.0 library but there are two bugs in your library/features I didn't comprehend that I had to work around.
The Repeat() extension method seems to "eat" a character even when it fails (it doesn't backtrack).
Source code: KlcImporter.zip
To reproduce: run the program. It will generate a output.txt . You can compare it with the original NameList.txt (for example with WinMerge) and they should be equal. Now replace
public virtual Expression Char() => X() + X() + X() + X() + ~(X() + ~X());
with
public virtual Expression Char() => X().Repeat(4, 6);
Re-run the program. Now the files are different ("Danish, Norwegian, Swedish, Walloon" becomes "anish, Norwegian, Swedish, Walloon"). A single character is "eaten" in the ExpandLineContainerElement() expression by the Repeat(), and even when it fails it isn't returned to the stream. The sequence should be: EscChar() fails, Char() begins, reads one character, fails, doesn't backtrack the read character (error), String() reads one less character.
P.S. I've added a
static CharacterSet To(this char from, char to, Func<char, bool> predicate)
extension overload . I feel it would be a good addiction to the ones you already implement. It is useful for implementing rules like "<sequence of characters in the range U+0020..U+02FF, except controls>"