GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Home Page:https://deepparse.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proper hyphen linked address components (unit-address) splitting

davebulaval opened this issue · comments

Originally posted by @MasseGuillaume in #136 (comment)

I think the parsing for apartments in Canada can be improved:

If you take a look at:
https://www.canadapost-postescanada.ca/cpc/en/support/kb/sending/general-information/how-to-address-mail-and-parcels

Put a hyphen between the unit/suite/apartment number and the street number. Don’t use the # symbol.

address_parser("1-123 Rue Toto Montreal Canada")

obtained:

FormattedParsedAddress<StreetNumber='1-123', StreetName='rue toto', Municipality='montreal', Province='canada'>

expected:

FormattedParsedAddress<StreetNumber='123', StreetName='rue toto', Unit='1' Municipality='montreal', Province='canada'>

NB. libpostal gives the same incorrect result:

docker run -d -p 8080:8080 clicksend/libpostal-rest  
curl -X POST -d '{"query": "1-123 rue toto Montreal Quebec Canada"}' localhost:8080/parser | jq "."
[
  {
    "label": "house_number",
    "value": "1-123"
  },
  {
    "label": "road",
    "value": "rue toto"
  },
  {
    "label": "city",
    "value": "montreal"
  },
  {
    "label": "state",
    "value": "quebec"
  },
  {
    "label": "country",
    "value": "canada"
  }
]

Out-of-the-box performances evaluated on a new dataset for these cases yields the following performance.

Model Type Accuracy
FastText 86,50
FaxtTextAtt 87,72
BPEmb 71,85
BPEmbAtt 87,81