Punctuation following abbreviations causes sentences to merge
Fdawgs opened this issue · comments
Node version: 18.18.2
Compromise version: 14.10
As title states, full stops and other punctuation types that denote an end of a sentence (?! etc.) that occur after an abbreviation causes the trailing sentence to be treated as part of the original sentence.
Reproduction:
const nlp = require('compromise');
const text = "Dr. Hibbert has advised starting Homer on morphine 400 mg. I have copied this letter to his general practitioner.";
const sentences = nlp(text).sentences().out('array');
console.log(sentences);
/**
* outputs:
* [
* 'Dr. Hibbert has advised starting Homer on morphine 400 mg. I have copied this letter to his general practitioner.'
* ]
*/
Comparison without using an abbreviation:
const nlp = require('compromise');
const text = "Dr. Hibbert has advised starting Homer on morphine 400 milligrams. I have copied this letter to his general practitioner.";
const sentences = nlp(text).sentences().out('array');
console.log(sentences);
/**
* outputs:
* [
* 'Dr. Hibbert has advised starting Homer on morphine 400 milligrams.',
* 'I have copied this letter to his general practitioner.'
* ]
*/
hey Frazer, with periods, this is the expected behaviour for abbreviations, like 400 mg. of THC
, and a sr. in high-school
.
but yeah '12 mg!' and 12 mg?
should truncate the sentence.
will add this one to the list. Good catch
cheers
fixed in 14.10.1
, thanks for the help
@spencermountain this is half-fixed. I think the problem is when an abbreviation is used in text and then followed by a genuine new sentence.
I prescribed him 400mg. He went to the pharmacy.
As I think about this, I guess there's no easy fix for it. We could detect an uppercase next work but I imagine that will have a lot of false-positives.
FWIW we have a large body of clinical dialogue in text and we rarely would see the .
after a unit abbreviation. It's not common at all. Usually it's presented without the .
i.e. He was injected with 400mg of morphine.