malllabiisc / RESIDE

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem about the GIDS dataset

ShomyLiu opened this issue · comments

Hi,

After checking that the max_len of all sentences in the GIDS dataset is 100, however, there are some values in SubPos and ObjPos larger than 100

So it seems that the max_len 100 is the length after preprocessing instead of the real max length of the sentences, is it?

Thanks

Hi @ShomyLiu,
Yes, that is true based on the distribution of length of sentences in GIDS dataset, we decided to fix max_len to 100. Taking real max length would require a lot of padding and thus things will not fit in GPU memory. Majority of sentences have less than 100 words so this decision doesn't affect the performance much.