Retrain attention: Caught KeyError in DataLoader worker process 0.
JonnoB opened this issue · comments
When running the example code retrain attention model, I get an error when running address_parser.retrain() The error reads
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.8/dist-packages/deepparse/converter/data_transform.py", line 49, in teacher_forcing_transform
vectorize_batch_pairs = self.vectorizer(batch_pairs)
File "/usr/local/lib/python3.8/dist-packages/deepparse/vectorizer/train_vectorizer.py", line 29, in call
target_tmp = [self.tags_vectorizer(target) for target in address[1]]
File "/usr/local/lib/python3.8/dist-packages/deepparse/vectorizer/train_vectorizer.py", line 29, in
target_tmp = [self.tags_vectorizer(target) for target in address[1]]
File "/usr/local/lib/python3.8/dist-packages/deepparse/converter/target_converter.py", line 22, in call
return self.tags_to_idx[key]
KeyError: 'ATag'
The error is cause as the data is that from the retrain with new tags example.
The error dissappears when the code referencing the dataset is changed to
training_dataset_name = "sample_incomplete_data"
test_dataset_name = "test_sample_data"
This error would be easier to understand if the function download_from_url()
had an explicit url that made it possible to know where the data was coming from without looking through the source code. An alternative would be to provide a fixed list of data options (there appears to be for example datasets), which would limit errors to choosing an invalid option from a known list.
You are right. I have fixed it and improved download_from_url()
doc. I have also improved error handling.
Will be included in 0.6.6 that I will release todayl