Only find one match per position

Question

Only find one match per position

kareman opened this issue 4 years ago · comments

Hi, nice library, I especially like the syntax.

Is there any way of parsing only one match per position? Now when I run code like this:

let myany = LazilyRepeating(one() as Wildcard<String>)
let token = Token("." • myany • Literal(" ") • myany • " ")
let parser = myany • token
parser.forwardMatches(enteringFrom: Match<String>(over: #" .1  slkjdf.2 .3  "#)).forEach {
	print($0.captures(for: token))
}

it seems to find all possible matches for each position:

[".1  "]
[".1  slkjdf.2 "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 .3  "]
[".2 .3 "]
[".2 .3  "]
[".2 .3  "]
[".3  "]

How do I tell LazilyRepeating to only find the shortest possible match?

Constantino Tsarouhas · Answer 1 · Thu Jun 04 2020 00:38:11 GMT+0800 (China Standard Time)

Thanks for your kind words!

I haven’t had much time recently to continue implementing a few more features I had in mind, and especially document a few things! (But I’m definitely planning to!)

Token is a pattern class that is supposed to be used at most once in a given pattern. When used on two or more places, it can capture multiple sequences per match but it’s not well-defined. The multiple-captures features is only well-defined when a single token object is used within some kind of repeating pattern (like LazilyRepeating): in that case it blindly follows the matching semantics of the repeating pattern.

I see multiple uses of the token within the pattern. Did you instead mean to use a back-reference, which matches a subsequence that a previous token matched? The read-me mentions Referencing but I haven’t gotten around to implementing it yet.

LazilyRepeating already tries to match its sub-pattern as few times as possible, or in your example, by applying the wildcard zero times, once, twice, thrice, and so. In your example output, it’s the first, second, third, … element. forwardMatches(…) returns a lazily evaluated array for every possible match, so you can just take the first match to get the match with the least repetitions.

However, forwardMatches(enteringFrom:) is a foundational method that returns partial matches (matches that might not cover the whole string yet). Larger patterns build on top of these partial matches. The matches(over:) method (in Pattern) does exclude all partial matches, and is the method for “client use”. :)

Kare Morstol · Answer 2 · Wed Jul 08 2020 00:12:47 GMT+0800 (China Standard Time)

Hi, and sorry about the very late reply. I liked the syntax in this project so much I implemented something similar in my PEG parser at https://github.com/kareman/Patterns . It made the API far easier to work with (as soon as you find out how to type • and ¿ 😄) .

Constantino Tsarouhas · Answer 3 · Wed Jul 08 2020 02:04:32 GMT+0800 (China Standard Time)

Looks nice, especially its strong typing and its VM approach! 💯