malllabiisc / RESIDE

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the PR curves of the GIDS dataset

YaNjIeE opened this issue · comments

Hi,
I wonder to know if the way to obtain the PR curves of the GIDS dataset is as same as the NYT dataset, where we only count the non-NA labels?
And How many data point in the test set you draw in PR-curves?

BTW, I notice that GIDS dataset has dev set. I'd like to know if you use it? Or just use the train set and test set?

Looking forward to your reply.

Best.

Hi,
Yes for drawing PR curves for GIDS dataset, we followed the exact same procedure which we used for NYT.
Unlike NYT, GIDS comes with a dev set.

OK, thanks a lot.
I want to ask anther question: what is the probY in the dataset?
In READMe, you said it's relation alias, but I am confused about that?
Would you please explain it for me?
Thanks a lot.

Best.

In other words, what does the index in 'probY' represent for each sentence?

Through this pipeline: https://github.com/malllabiisc/RESIDE/blob/master/images/relation_alias.png
We get a probability distribution over relations for each sentence in the dataset
which is used as side information. That's what ProbY means.