GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Home Page:https://deepparse.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Retrain attention: Caught KeyError in DataLoader worker process 0.

JonnoB opened this issue · comments

When running the example code retrain attention model, I get an error when running address_parser.retrain() The error reads

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.8/dist-packages/deepparse/converter/data_transform.py", line 49, in teacher_forcing_transform
vectorize_batch_pairs = self.vectorizer(batch_pairs)
File "/usr/local/lib/python3.8/dist-packages/deepparse/vectorizer/train_vectorizer.py", line 29, in call
target_tmp = [self.tags_vectorizer(target) for target in address[1]]
File "/usr/local/lib/python3.8/dist-packages/deepparse/vectorizer/train_vectorizer.py", line 29, in
target_tmp = [self.tags_vectorizer(target) for target in address[1]]
File "/usr/local/lib/python3.8/dist-packages/deepparse/converter/target_converter.py", line 22, in call
return self.tags_to_idx[key]
KeyError: 'ATag'

The error is cause as the data is that from the retrain with new tags example.

The error dissappears when the code referencing the dataset is changed to

training_dataset_name = "sample_incomplete_data"
test_dataset_name = "test_sample_data"

This error would be easier to understand if the function download_from_url() had an explicit url that made it possible to know where the data was coming from without looking through the source code. An alternative would be to provide a fixed list of data options (there appears to be for example datasets), which would limit errors to choosing an invalid option from a known list.

You are right. I have fixed it and improved download_from_url() doc. I have also improved error handling.

Will be included in 0.6.6 that I will release todayl