scrapinghub / dateparser

python parser for human readable dates

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Strange parser error: search_dates parses "2010 Year" to a date with year of 4033

leeprevost opened this issue · comments

Very strange issue.

dateparser.__version__
'1.1.8'


settings= {
 'RELATIVE_BASE': datetime.datetime(2023, 7, 31, 0, 0),
 'PREFER_DAY_OF_MONTH': 'first',
 'PREFER_DATES_FROM': 'future',
 'REQUIRE_PARTS': ['year', 'month'],
 'DATE_ORDER': 'YMD'
}
s = 'Closing Yield, 2010 Year Treasury notes On Dec 31, 2023'
search_dates(s, settings=settings)

Result:
Out[27]:

[('2010 Year', datetime.datetime(4033, 7, 31, 0, 0)),
 ('On Dec 31, 2023', datetime.datetime(2023, 12, 31, 0, 0))]

(impossible year 4033 from the first part of the parse)

Also, put this question on SO *link:**

This is because year is interpreted the same as years, and “2010 years” is interpreted as “2010 years later“.

Maybe we could make it so that if it is year, singular, it only works like that for “1 year”, and otherwise it gets translated to “year 2010” for example. But it may not be trivial to address.

OK, thank you. I can work around this now that I know what the rules are. Could you point me to source so that I can see the ruleset? And is that user configurable?

The code base is relatively complex, and I don’t think this case is user configurable at the moment.

OK - I thought I saw a definitions page with the regex sequences it was using to parse. But, if not easy, I'll work around this. Want me to close this out?

Want me to close this out?

No, I think this is a valid issue, and we want to eventually address it.