Year extraction not parsed correctly

Question

Year extraction not parsed correctly

PrajnyaSatish opened this issue 3 years ago · comments

Is there a way to check and ignore numerals that have text surrounding them? For example in "from 10 - 16:00 in 650EK" parses 650 as a year -

[
   {
      "body" : "from 10 - 16:00 in 650",
      "dim" : "time",
      "end" : 22,
      "latent" : false,
      "start" : 0,
      "value" : {
         "from" : {
            "grain" : "minute",
            "value" : "0650-01-01T00:00:00.000-08:00"
         },
         "to" : {
            "grain" : "minute",
            "value" : "0650-01-01T16:00:00.000-08:00"
         },
         "type" : "interval",
         "values" : [
            {
               "from" : {
                  "grain" : "minute",
                  "value" : "0650-01-01T00:00:00.000-08:00"
               },
               "to" : {
                  "grain" : "minute",
                  "value" : "0650-01-01T16:00:00.000-08:00"
               },
               "type" : "interval"
            },
            {
               "from" : {
                  "grain" : "minute",
                  "value" : "0650-01-01T10:00:00.000-08:00"
               },
               "to" : {
                  "grain" : "minute",
                  "value" : "0650-01-01T16:00:00.000-08:00"
               },
               "type" : "interval"
            },
            {
               "from" : {
                  "grain" : "minute",
                  "value" : "0650-01-01T22:00:00.000-08:00"
               },
               "to" : {
                  "grain" : "minute",
                  "value" : "0650-01-02T16:00:00.000-08:00"
               },
               "type" : "interval"
            }
         ]
      }
   }
]

I do not however want 650 to be parsed as a year.

Steven Troxler · Answer 1 · Tue Dec 21 2021 14:07:14 GMT+0800 (China Standard Time)

I think that might be hard to do - we want to parse the text into separate tokens because things like 25lbs need text separated (and even dates, like 2021AD).

As a result though, I'm not sure if the current duckling engine can expose when there's text with no space to the rules layer. But, cc @chessai who knows more about the backend than I do.

Prajnya · Answer 2 · Tue Jan 04 2022 01:10:36 GMT+0800 (China Standard Time)

Hi @chessai, any comments?