Year extraction not parsed correctly
PrajnyaSatish opened this issue · comments
Prajnya commented
Is there a way to check and ignore numerals that have text surrounding them? For example in "from 10 - 16:00 in 650EK" parses 650 as a year -
[
{
"body" : "from 10 - 16:00 in 650",
"dim" : "time",
"end" : 22,
"latent" : false,
"start" : 0,
"value" : {
"from" : {
"grain" : "minute",
"value" : "0650-01-01T00:00:00.000-08:00"
},
"to" : {
"grain" : "minute",
"value" : "0650-01-01T16:00:00.000-08:00"
},
"type" : "interval",
"values" : [
{
"from" : {
"grain" : "minute",
"value" : "0650-01-01T00:00:00.000-08:00"
},
"to" : {
"grain" : "minute",
"value" : "0650-01-01T16:00:00.000-08:00"
},
"type" : "interval"
},
{
"from" : {
"grain" : "minute",
"value" : "0650-01-01T10:00:00.000-08:00"
},
"to" : {
"grain" : "minute",
"value" : "0650-01-01T16:00:00.000-08:00"
},
"type" : "interval"
},
{
"from" : {
"grain" : "minute",
"value" : "0650-01-01T22:00:00.000-08:00"
},
"to" : {
"grain" : "minute",
"value" : "0650-01-02T16:00:00.000-08:00"
},
"type" : "interval"
}
]
}
}
]
I do not however want 650 to be parsed as a year.
Steven Troxler commented
I think that might be hard to do - we want to parse the text into separate tokens because things like 25lbs need text separated (and even dates, like 2021AD).
As a result though, I'm not sure if the current duckling engine can expose when there's text with no space to the rules layer. But, cc @chessai who knows more about the backend than I do.