wanasit / chrono

A natural language date parser in Javascript

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"may" is always parsed as a month even if it should not.

cphyc opened this issue · comments

The English lowercase word "may" is often (always?) recognized as the month with the same spelling. For example:

> chrono.parse("I may by here. May the force be with you. Theresa may become PM soon.")
[
  ParsingResult {
    reference: ReferenceWithTimezone { instant: 2023-06-30T10:16:04.584Z },
    refDate: 2023-06-30T10:16:04.584Z,
    index: 2,
    text: 'may',
    start: ParsingComponents {
      reference: [ReferenceWithTimezone],
      knownValues: [Object],
      impliedValues: [Object]
    },
    end: null
  },
  ParsingResult {
    reference: ReferenceWithTimezone { instant: 2023-06-30T10:16:04.584Z },
    refDate: 2023-06-30T10:16:04.584Z,
    index: 15,
    text: 'May',
    start: ParsingComponents {
      reference: [ReferenceWithTimezone],
      knownValues: [Object],
      impliedValues: [Object]
    },
    end: null
  },
  ParsingResult {
    reference: ReferenceWithTimezone { instant: 2023-06-30T10:16:04.584Z },
    refDate: 2023-06-30T10:16:04.584Z,
    index: 50,
    text: 'may',
    start: ParsingComponents {
      reference: [ReferenceWithTimezone],
      knownValues: [Object],
      impliedValues: [Object]
    },
    end: null
  }
]

While this is OK when parsing something we know is a date, it yields many false positives when using chrono to detect dates in regular text.

See cphyc/thunderbird_date#2.