Help creating a rule for a last name
rejeep opened this issue · comments
Hi,
I'm trying to create a rule for a last name. This is what I have come up with:
rule last_name
[A-Za-z]+ space [iI]+ | [iI]+ &[^iI]+ | [^iI] [A-Za-z]+
end
So:
- Match a "name", followed by a space, followed by any number of
i's
. Or - If the first characters are one or more
i's
, then something that is not ani
must follow. Or - If the first character is not an
i
, then any "name" can follow
If correct, this should be able to parse:
- Rule 1) Love III
- Rule 2) Immelman
- Rule 3) Donald
But it fails on Immelman
. It would also fail on for example Love IIIx
.
I guess my second rule is wrong? But why?
I don't follow the logic that you're using. Why not just try a simpler pattern, like [A-Za-z]+ (" "* [iI]+)?
? Here's what I get when I use this pattern in irb:
irb> require 'citrus'
=> true
irb> rule = Citrus.rule '[A-Za-z]+ (" "* [iI]+)?'
=> /[A-Za-z]/+ (" "* /[iI]/+)?
irb> rule.test 'Love III'
=> 8
irb> rule.test 'Immelman'
=> 8
irb> rule.test 'Donald'
=> 6
Because I have another rule, which would conflict with this. If I do it like you, then the name David Love III
would parse as first name David
, middle name Love
and last name III
. But the first name should be Davis
and last name Love III
. What I'm trying with my rule is to make sure that the last name can not be only I's
.
Maybe it's simpler if I give you the whole grammar:
grammar Name
rule name
first_name space middle_name space last_name |
first_name space last_name |
first_name
end
rule first_name
[A-Za-z]+
end
rule last_name
[A-Za-z]+ space [iI]+ | [iI]+ &[^iI]+ | [^iI] [A-Za-z]+
end
rule middle_name
([A-Za-z] '.') {
delete('.')
}
| [A-Za-z]+
end
rule space
[ \t]*
end
end
Why don't you try something like this:
require 'citrus'
Citrus.eval(<<CITRUS)
grammar Name
rule name
first_name space middle_name space last_name space suffix? |
first_name space last_name space suffix? |
first_name
end
rule first_name
[A-Za-z]+
end
rule middle_name
([A-Za-z] '.') {
delete('.')
}
| [A-Za-z]+
end
rule last_name
!suffix [A-Za-z]+
end
rule suffix
[iI]+ | `jr` '.'?
end
rule space
[ \t]*
end
end
CITRUS
puts Name.parse("David Love III").dump
This grammar separates out the suffix of the name (I've allowed for "jr." as well, just to demonstrate) from the last name. You can see in the dump of the match how the various tokens are broken up.
Ahh, nice. Thanks!