Help creating a rule for a last name

Question

Help creating a rule for a last name

rejeep opened this issue 13 years ago · comments

Hi,

I'm trying to create a rule for a last name. This is what I have come up with:

rule last_name
  [A-Za-z]+ space [iI]+ | [iI]+ &[^iI]+ | [^iI] [A-Za-z]+
end

So:

Match a "name", followed by a space, followed by any number of i's. Or
If the first characters are one or more i's, then something that is not an i must follow. Or
If the first character is not an i, then any "name" can follow

If correct, this should be able to parse:

Rule 1) Love III
Rule 2) Immelman
Rule 3) Donald

But it fails on Immelman. It would also fail on for example Love IIIx.

I guess my second rule is wrong? But why?

Michael Jackson · Answer 1 · Thu Oct 27 2011 13:08:33 GMT+0800 (China Standard Time)

I don't follow the logic that you're using. Why not just try a simpler pattern, like [A-Za-z]+ (" "* [iI]+)?? Here's what I get when I use this pattern in irb:

irb> require 'citrus'
=> true
irb> rule = Citrus.rule '[A-Za-z]+ (" "* [iI]+)?'
=> /[A-Za-z]/+ (" "* /[iI]/+)?
irb> rule.test 'Love III'
=> 8
irb> rule.test 'Immelman'
=> 8
irb> rule.test 'Donald'
=> 6

Johan Andersson · Answer 2 · Thu Oct 27 2011 13:51:41 GMT+0800 (China Standard Time)

Because I have another rule, which would conflict with this. If I do it like you, then the name David Love III would parse as first name David, middle name Love and last name III. But the first name should be Davis and last name Love III. What I'm trying with my rule is to make sure that the last name can not be only I's.

Maybe it's simpler if I give you the whole grammar:

grammar Name
  rule name
    first_name space middle_name space last_name |
    first_name space last_name |
    first_name
  end

  rule first_name
    [A-Za-z]+
  end

  rule last_name
    [A-Za-z]+ space [iI]+ | [iI]+ &[^iI]+ | [^iI] [A-Za-z]+
  end

  rule middle_name
    ([A-Za-z] '.') {
      delete('.')
    }
    | [A-Za-z]+
  end

  rule space
    [ \t]*
  end
end

Michael Jackson · Answer 3 · Thu Oct 27 2011 14:08:16 GMT+0800 (China Standard Time)

Why don't you try something like this:

require 'citrus'

Citrus.eval(<<CITRUS)
grammar Name
  rule name
    first_name space middle_name space last_name space suffix? |
    first_name space last_name space suffix? |
    first_name
  end

  rule first_name
    [A-Za-z]+
  end

  rule middle_name
    ([A-Za-z] '.') {
      delete('.')
    }
    | [A-Za-z]+
  end

  rule last_name
    !suffix [A-Za-z]+
  end

  rule suffix
    [iI]+ | `jr` '.'?
  end

  rule space
    [ \t]*
  end
end
CITRUS

puts Name.parse("David Love III").dump

This grammar separates out the suffix of the name (I've allowed for "jr." as well, just to demonstrate) from the last name. You can see in the dump of the match how the various tokens are broken up.

Johan Andersson · Answer 4 · Sun Oct 30 2011 02:18:35 GMT+0800 (China Standard Time)

Ahh, nice. Thanks!