spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Matched sentence text duplicated if match value occurs more than once in string

Fdawgs opened this issue · comments

Tested with 14.8.2, 14.9.0 and 14.10.0 on Node 18.18.2

As title states, the matched sentence text is duplicated if the match value occurs more than once in string:

const nlp = require('compromise');
const text = "She sells sea shells by the sea shore"; // Two occurrences of "sea"

const match = nlp(text).match("sea");

console.log(match.sentence().text()); // Returns "She sells sea shells by the sea shoreShe sells sea shells by the sea shore"

Happy to try and have a go at fixing this but have no idea in the code where this lives!

hey Frazer, i know it's an awkward output, but i think it's technically doing what it is supposed to.
There's two matches of sea in the same sentence, so calling match.sentences() will return two versions of the sentence.
you may want to run match.sentences().unique().text().

I think sentences() should probably unique by default - and also, the end punctuation should be present when it's doubling itself - but yeah, think this one's mostly doing what it should.
cheers

Thanks Spencer, that'll teach me for not reading the docs!