yohasebe / engtagger

English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

present participle identified as a gerund

brandondrew opened this issue · comments

In the following example, "attaching" is not a gerund, but EngTagger identifies it as one:

irb(main):009:0> text = "I'm attaching a flyer with our information."
=> "I'm attaching a flyer with our information."
irb(main):010:0> tagged_text = tagger.add_tags(text)
=> "<prp>I</prp> <vbp>'m</vbp> <vbg>attaching</vbg> <det>a</det> <nn>flyer</nn> <...
irb(main):011:0> readable_tagged_text = tagger.get_readable(text)
=> "I/PRP 'm/VBP attaching/VBG a/DET flyer/NN with/IN our/PRPS information/NN ./PP"
irb(main):012:0>

"Attaching" is a present participle—the main verb of the sentence. In order to be a gerund, it would need to act as a noun, for example in the following sentence:

Attaching paper clips to small piles of papers is very boring.

You are right, engtagger's parsing is often not very correct; it is a port of Perl's Lingua::EN::Tagger library, so problems in the statistical data provided by the original library are still present in engtagger.

So I have created another rubygem for better English sentence analysis in Ruby. Try ruby-spacy. It requires Python's SpaCy library to be installed on your system, but is more accurate and richer in features.

That said, I think engtagger has its own merits: it allows you to quickly check the part of speech of words without having to build a tool chain involving Python.