GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Home Page:https://deepparse.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Error during downloading the weights for the network bpemb.

IvanShift opened this issue · comments

Hello!
It's impossible to download weights for this network. Could you upload this file somewhere else?

To Reproduce

 address_parser = AddressParser(model_type="bpemb", device=0) 

Full error message:

/home/dev/.local/lib/python3.10/site-packages/deepparse/parser/address_parser.py:950: UserWarning: No CUDA device detected, device will be set to 'CPU'.
  warnings.warn("No CUDA device detected, device will be set to 'CPU'.")
Loading the embeddings model
/home/dev/.local/lib/python3.10/site-packages/deepparse/network/seq2seq.py:100: UserWarning: No pre-trained model where found in the cache directory /home/dev/.cache/deepparse. Thus, we willautomatically download the pre-trained model.
  warnings.warn(
Downloading the weights for the network bpemb.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 169, in _new_conn
    conn = connection.create_connection(
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 353, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 174, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fdd1426a4d0>, 'Connection to graal.ift.ulaval.ca timed out. (connect timeout=5)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='graal.ift.ulaval.ca', port=443): Max retries exceeded with url: /public/deepparse/bpemb.ckpt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fdd1426a4d0>, 'Connection to graal.ift.ulaval.ca timed out. (connect timeout=5)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dev/.local/lib/python3.10/site-packages/deepparse/parser/address_parser.py", line 237, in __init__
    self._model_factory(
  File "/home/dev/.local/lib/python3.10/site-packages/deepparse/parser/address_parser.py", line 1051, in _model_factory
    self.model = BPEmbSeq2SeqModel(
  File "/home/dev/.local/lib/python3.10/site-packages/deepparse/network/bpemb_seq2seq.py", line 70, in __init__
    self._load_pre_trained_weights(model_weights_name, cache_dir=cache_dir)
  File "/home/dev/.local/lib/python3.10/site-packages/deepparse/network/seq2seq.py", line 104, in _load_pre_trained_weights
    download_weights(model_type, cache_dir, verbose=self.verbose)
  File "/home/dev/.local/lib/python3.10/site-packages/deepparse/tools.py", line 109, in download_weights
    download_from_public_repository(model, saving_dir, "ckpt")
  File "/home/dev/.local/lib/python3.10/site-packages/deepparse/tools.py", line 92, in download_from_public_repository
    r = requests.get(url, timeout=5)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 504, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='graal.ift.ulaval.ca', port=443): Max retries exceeded with url: /public/deepparse/bpemb.ckpt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fdd1426a4d0>, 'Connection to graal.ift.ulaval.ca timed out. (connect timeout=5)'))

Expected behavior
Successfully downloaded the weight of this model

Desktop:

  • OS: Ubuntu 22.04
  • Version: 0.9.1

Thank you for you interest in improving Deepparse.

This is a time-out error. It is possible our server was momentarily down when you tried to download the pre-trained weights. Try it again.

I assume that access from Russia is blocked.

It's probably the case. Maybe use a VPN to download the pre-trained weights. However, we verify if the model is the latest. If you can download the weights, I can figure out a way to remove the version verification with an argument. Let me know if you would like to have this feature.

Yes, everything is OK with VPN. It would be nice if you added this argument (to remove the version check)

Will do.

@IvanShift I have pushed a feature to allow an AddressParser to be used offline in dev. Here is an example:

address_parser = AddressParser("fasttext", offline=True)

You need to pre-download all the dependencies. For that, you can use our download_model CLI function.

If you can take a lot and it and come back to me with other improvements or tell me if everything is ok, it would be appreciated.

You can install the dev version with pip install -U git+https://github.com/GRAAL-Research/deepparse.git@dev.

Tks

Will close since merge in dev.