"may" is always parsed as a month even if it should not.
cphyc opened this issue · comments
Corentin Cadiou commented
The English lowercase word "may" is often (always?) recognized as the month with the same spelling. For example:
> chrono.parse("I may by here. May the force be with you. Theresa may become PM soon.")
[
ParsingResult {
reference: ReferenceWithTimezone { instant: 2023-06-30T10:16:04.584Z },
refDate: 2023-06-30T10:16:04.584Z,
index: 2,
text: 'may',
start: ParsingComponents {
reference: [ReferenceWithTimezone],
knownValues: [Object],
impliedValues: [Object]
},
end: null
},
ParsingResult {
reference: ReferenceWithTimezone { instant: 2023-06-30T10:16:04.584Z },
refDate: 2023-06-30T10:16:04.584Z,
index: 15,
text: 'May',
start: ParsingComponents {
reference: [ReferenceWithTimezone],
knownValues: [Object],
impliedValues: [Object]
},
end: null
},
ParsingResult {
reference: ReferenceWithTimezone { instant: 2023-06-30T10:16:04.584Z },
refDate: 2023-06-30T10:16:04.584Z,
index: 50,
text: 'may',
start: ParsingComponents {
reference: [ReferenceWithTimezone],
knownValues: [Object],
impliedValues: [Object]
},
end: null
}
]
While this is OK when parsing something we know is a date, it yields many false positives when using chrono
to detect dates in regular text.