hsqmlzno1 / HATN

Hierarchical Attention Transfer Network for Cross-domain Sentiment Classification (AAAI'18)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Test data in vocabulary preparation

avinashsai opened this issue · comments

Hi,
Congratulations on your amazing work. I have a doubt in vocabulary preparation in line 47 in utils.py.
Testing data is also used in vocabulary preparation. However, testing data should be completely unseen right??
Please correct me if I am wrong.

Thank you

Testing data should be used in vocabulary preparation. Otherwise, you cannot learn the semantics and information of any target-specific words.

If there exists a large amount of unlabeled data in the target domain, the vocabulary of target unlabeled data is enough to cover the testing data of the target domain. In this case, i think it's not necessary to use the dictionary from the testing data.