spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Geedy tag matching and punctuation

amorfee opened this issue · comments

commented

Hello,

I've come across an issue with greedy tag matching and comma separation. The comma is included as part of the match so that multiple tags are combined into one match.

Screenshot 2024-04-09 at 11 08 39

This is likely expected behaviour for a tag such as #Place but is there a way to force the comma as a word separator? Using .normalize() doesn't seem to help it just removes the comma from the match.

Thank you

hey, yeah good question. There are a few ways you could do this.

You could split by whatever, then filter then down:

let parts = doc.splitAfter('@hasComma');
parts = parts.if('#Place')

I sometimes do a aggressive split and then join em up, which is probably a weirder process:

let parts = doc.split('#Place')
parts= parts.joinIf('#Place && @hasComma', '#Place')

dunno!
cheers

commented

Thank you, .splitAfter() seems to do what we need.