KeyError in Vocab
subhashree8 opened this issue · comments
I am getting a KeyError in line 41 of utils/dataset.py irrespective of whichever input entity/relation is given. ent_vocab[entity_name] simply doesn't work. Has the structure of the vocabulary been changed recently?
Hi, @subhashree8
Sorry for late reply. I haven't modified the code related to vocab so far.
Can you show me the command you ran?
I think each entity in the entity list file doesn't match to one in the training triplets.
If you don't use wordnet-mlj12 or FB15k dataset, please confirm it.
Hi @mana-ysh, this is the command I ran:
run train.py --mode single --ent train.entlist --rel train.rellist --train wordnet-mlj12-train.txt --valid wordnet-mlj12-valid.txt --log C:\Users\subha_000\Documents\knowledge-graph-embeddings-master\knowledge-graph-embeddings-master\src\logs
I had created the entlist and rellist in the same manner as you had mentioned in your pre-processing file.
I had created the entlist and rellist in the same manner as you had mentioned in your pre-processing file.
Does it mean that you ran preprocessing.sh
?
Yes
Thank you for replying.
In my environment, it works.
Can you check that the same files as me are generated?
▶ head train.entlist
00001740
00001930
00002137
00002325
00002452
00002573
00002684
00002724
00002942
00003316
▶ head train.rellist
_also_see
_derivationally_related_form
_has_part
_hypernym
_hyponym
_instance_hypernym
_instance_hyponym
_member_holonym
_member_meronym
_member_of_domain_region
▶ head wordnet-mlj12-train.txt
03964744 _hyponym 04371774
00260881 _hypernym 00260622
02199712 _member_holonym 02188065
01332730 _derivationally_related_form 03122748
06066555 _derivationally_related_form 00645415
09322930 _instance_hypernym 09360122
11575425 _hyponym 12255934
07193596 _derivationally_related_form 00784342
05726596 _hyponym 06162979
01768969 _derivationally_related_form 02636811
▶ head wordnet-mlj12-valid.txt
02174461 _hypernym 02176268
05074057 _derivationally_related_form 02310895
08390511 _synset_domain_topic_of 08199025
02045024 _member_meronym 02046321
04758181 _hypernym 04757864
09419536 _instance_hypernym 09411430
12165384 _hypernym 12163824
09384921 _part_of 08853741
04881998 _derivationally_related_form 01299888
00612652 _derivationally_related_form 01004072
Triplet files are tab-separated and entity/relation list files contain each ID or name line by line.
Thanks a lot for replying.
Yes, I have exactly the same files. When I print the keys of rel_vocab along with their type and length of the key string, I get this:
_also_see <class 'str'> 20
<class 'str'> 1
_derivationally_related_form <class 'str'> 57
_has_part <class 'str'> 19
_hypernym <class 'str'> 19
_hyponym <class 'str'> 17
_instance_hypernym <class 'str'> 37
_instance_hyponym <class 'str'> 35
_member_holonym <class 'str'> 31
_member_meronym <class 'str'> 31
_member_of_domain_region <class 'str'> 49
_member_of_domain_topic <class 'str'> 47
_member_of_domain_usage <class 'str'> 47
_part_of <class 'str'> 17
_similar_to <class 'str'> 23
_synset_domain_region_of <class 'str'> 49
_synset_domain_topic_of <class 'str'> 47
_synset_domain_usage_of <class 'str'> 47
_verb_group <class 'str'> 23
I am not sure why the 2nd one is a blank key. Also, when I print the length of the key "_hyponym" which is given as input for rel_vocab in line 41, it shows:
_hyponym <class 'str'> 8
Is the difference in the lengths of the same string in the 2 different places, the cause for the concern?
This issue is caused by running preprocees.sh
. This shell script generates different list files because of the different unix environment. I will upload the already preprocessed files near future to resolve this
Thanks @subhashree8 for your cooperation!