Problem about the GIDS dataset
ShomyLiu opened this issue · comments
Hi,
After checking that the max_len of all sentences in the GIDS dataset is 100, however, there are some values in SubPos
and ObjPos
larger than 100
So it seems that the max_len 100 is the length after preprocessing instead of the real max length of the sentences, is it?
Thanks
Hi @ShomyLiu,
Yes, that is true based on the distribution of length of sentences in GIDS dataset, we decided to fix max_len to 100. Taking real max length would require a lot of padding and thus things will not fit in GPU memory. Majority of sentences have less than 100 words so this decision doesn't affect the performance much.