spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get .terms() but keep hyphenated strings (similar to .hyphenated() )

PuneetKohli opened this issue · comments

Is there a way to achieve this?

hey Puneet, good question:
Little weird, but you could do .splitAfter('!@hasHyphen'), like this:
https://runkit.com/spencermountain/659822ebdfb7e500085838fd

Alternatively, you could shim-in a custom tokenizer, like:

nlp.world().methods.one.tokenize.splitTerms = function (str) {
  return str.split(/ /)
}
nlp('one two-three four five').debug()
// one, two-three, four, five

that one is obviously simplified, but let me know if you'd like some more help.
cheers