miyyer / scpn

syntactically controlled paraphrase networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions about trans_embs

shuangqinbuaa opened this issue · comments

Hello, thank you so much for sharing the code with us. I have learned a lot. Thank you so much! But I have some questions about the trans_embs in this code.

  1. In train_scpn.py, SCPN's parameter "len_parse_voc" is 103, which means the parse vocabulary doesn't include the token 'EOP'. But during the training of SCPN, the function indexify_transformations() is called to get valid instances of transformations. In this function, deleaf() is called and deleaf() will add 'EOP' at the end of the parse. But there isn't 'EOP' in the parse vocabulary which will result in mistakes when transform the parse tag into index.
  2. There might be the token 'EOP' in the parses generated from ParseNet. But the trans_embs' shape in SCPN is (103*56), which means the embedding table doesn't include 'EOP'. This will result in errors when running generate_paraphrases.py.
  3. SCPN and ParseNet use different trans_embs, what if they share the same trans_embs ?

For question 2, when I run generate_paraphrases.py, one of the generated full parses are as follows:
1528443868 1
The genetated full parses may not end with the index of 'EOP', which is 103, so the operation underlined may not work.
_20180608154838

thanks for pointing this out; we may have introduced a bug while cleaning up the code for release. i'll look into it on monday and get back to you!

Hi @miyyer and @jwieting ,

i'm too facing above problem. so what's the solution to above ?

should we add 'EOP' to end of parse from input & output text ( to result of deleaf() ) in train_scpn or not ?

label_voc = {}
for idx, line in enumerate(tag_file):
      line = line.strip()
      if line != 'EOP':
          label_voc[line] = idx
rev_label_voc = dict((v,k) for (k,v) in label_voc.iteritems()) 

from above code, if we add 'EOP' by commenting line 4 , then model.trans_embs.weight size would be (104, 56) and that means we need to add vocabulary 'EOP' to label_vocab dictionary. else it would be (103, 56) means we will skip 'EOP' in label_vocab dictionary.. if we are keeping as it is, then it is throwing issue " 'EOP' is not available or KeyError in label_vocab when indexify_tranformation() getting called."

Thanks