GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Home Page:https://deepparse.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Few comments

freud14 opened this issue · comments

Hi,
Here is a few comments on the library.

Let's start with the big one. I noticed that the documentation web site documents almost everything in the library. Is it on purpose? I mean, in software engineering, if you document it, it means that you intend to support all these different parts and that their interfaces should remain stable. In my opinion, you should document only the parts that are essential to you the library and keep the other parts only as backends. Maybe it is because you want to support training as we can see in issue #11, so you intend for the user to use these parts for training? Anyway, just some thoughts for you. I would like to know what are your intentions.

Now, a few comments on the code I looked at.

https://github.com/MAYAS3/deepparse/blob/67674b892aa0809e3d8d4c1303624212aec13d7d/deepparse/parser/address_parser.py#L47

I think there should be default values for the parameters of the AddressParser class. What I would suggest is that it should use the device 0 by default if it exists, otherwise just use the CPU. Maybe choose a default model too.

https://github.com/MAYAS3/deepparse/blob/67674b892aa0809e3d8d4c1303624212aec13d7d/deepparse/parser/address_parser.py#L90

When tagging an address, it would be nice if the return was a dictionary where the keys are the tags and the values are the words. For instance, instead of this:

{'350 rue des Lilas Ouest Québec Québec G1L 1B6': {'350': 'StreetNumber',
  'rue': 'StreetName',
  'des': 'StreetName',
  'Lilas': 'StreetName',
  'Ouest': 'StreetName',
  'Québec': 'Province',
  'G1L': 'PostalCode',
  '1B6': 'PostalCode'}}

it could be something like this:

{'350 rue des Lilas Ouest Québec Québec G1L 1B6': {'StreetNumber': '350',
  'StreetName': 'rue des Lilas Ouest',
  'Province': 'Québec',
  'PostalCode': 'G1L 1B6'}}

Notice how some tags where merged and the keys and values are inverted. Maybe there could be a flag if you want the other way. Or, better yet, you could return an object.

Alright, that's it for now.

Good call for the doc #22
Same for default params #23

Really a good idea for the dictionary. Now we return an object #24