mana-ysh / knowledge-graph-embeddings

Implementations of Embedding-based methods for Knowledge Base Completion tasks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyError in Vocab

subhashree8 opened this issue · comments

I am getting a KeyError in line 41 of utils/dataset.py irrespective of whichever input entity/relation is given. ent_vocab[entity_name] simply doesn't work. Has the structure of the vocabulary been changed recently?

Hi, @subhashree8

Sorry for late reply. I haven't modified the code related to vocab so far.
Can you show me the command you ran?
I think each entity in the entity list file doesn't match to one in the training triplets.
If you don't use wordnet-mlj12 or FB15k dataset, please confirm it.

Hi @mana-ysh, this is the command I ran:
run train.py --mode single --ent train.entlist --rel train.rellist --train wordnet-mlj12-train.txt --valid wordnet-mlj12-valid.txt --log C:\Users\subha_000\Documents\knowledge-graph-embeddings-master\knowledge-graph-embeddings-master\src\logs

I had created the entlist and rellist in the same manner as you had mentioned in your pre-processing file.

I had created the entlist and rellist in the same manner as you had mentioned in your pre-processing file.

Does it mean that you ran preprocessing.sh ?

Thank you for replying.
In my environment, it works.

Can you check that the same files as me are generated?

▶  head train.entlist
00001740
00001930
00002137
00002325
00002452
00002573
00002684
00002724
00002942
00003316
▶  head train.rellist
_also_see
_derivationally_related_form
_has_part
_hypernym
_hyponym
_instance_hypernym
_instance_hyponym
_member_holonym
_member_meronym
_member_of_domain_region
▶  head wordnet-mlj12-train.txt
03964744	_hyponym	04371774
00260881	_hypernym	00260622
02199712	_member_holonym	02188065
01332730	_derivationally_related_form	03122748
06066555	_derivationally_related_form	00645415
09322930	_instance_hypernym	09360122
11575425	_hyponym	12255934
07193596	_derivationally_related_form	00784342
05726596	_hyponym	06162979
01768969	_derivationally_related_form	02636811
▶  head wordnet-mlj12-valid.txt
02174461	_hypernym	02176268
05074057	_derivationally_related_form	02310895
08390511	_synset_domain_topic_of	08199025
02045024	_member_meronym	02046321
04758181	_hypernym	04757864
09419536	_instance_hypernym	09411430
12165384	_hypernym	12163824
09384921	_part_of	08853741
04881998	_derivationally_related_form	01299888
00612652	_derivationally_related_form	01004072

Triplet files are tab-separated and entity/relation list files contain each ID or name line by line.

Thanks a lot for replying.
Yes, I have exactly the same files. When I print the keys of rel_vocab along with their type and length of the key string, I get this:
_also_see <class 'str'> 20
<class 'str'> 1
_derivationally_related_form <class 'str'> 57
_has_part <class 'str'> 19
_hypernym <class 'str'> 19
_hyponym <class 'str'> 17
_instance_hypernym <class 'str'> 37
_instance_hyponym <class 'str'> 35
_member_holonym <class 'str'> 31
_member_meronym <class 'str'> 31
_member_of_domain_region <class 'str'> 49
_member_of_domain_topic <class 'str'> 47
_member_of_domain_usage <class 'str'> 47
_part_of <class 'str'> 17
_similar_to <class 'str'> 23
_synset_domain_region_of <class 'str'> 49
_synset_domain_topic_of <class 'str'> 47
_synset_domain_usage_of <class 'str'> 47
_verb_group <class 'str'> 23

I am not sure why the 2nd one is a blank key. Also, when I print the length of the key "_hyponym" which is given as input for rel_vocab in line 41, it shows:
_hyponym <class 'str'> 8

Is the difference in the lengths of the same string in the 2 different places, the cause for the concern?

This issue is caused by running preprocees.sh. This shell script generates different list files because of the different unix environment. I will upload the already preprocessed files near future to resolve this

Thanks @subhashree8 for your cooperation!