Wuyxin / DIR-GNN

Official code of "Discovering Invariant Rationales for Graph Neural Networks" (ICLR 2022)

Home Page:https://arxiv.org/abs/2201.12872

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The issues of calculating precision@K

AGTSAAA opened this issue · comments

The way you calculate the precision@K may be wrong since you should divide the K at the last line instead of the num_gd. I also want to ask what are the meanings of C and E.

num_gd = int(ground_truth_mask[C: C + E].sum())
pred = pred_weight[C:C + E]
_, indices_for_sort = pred.sort(descending=True, dim=-1)
idx = indices_for_sort[:num_gd].detach().cpu().numpy()
precision.append(ground_truth_mask[C: C + E][idx].sum().float()/num_gd)](url)

Thank you very much!

Hi,

C is the start of a graph's edge index and E is the number of edges in this graph Here basically we are splitting a graph from a batch to compute the metric.

Detailed explanation:

ground_truth_mask[C: C + E] is the ground truth causal mask.
=> (ground_truth_mask[C: C + E][idx] is the mask predicted by DIR.

hits = sum((ground_truth_mask[C: C + E][idx])
total = sum(ground_truth_mask[C: C + E])

precision = #hits / total

We don't need to divide K since here we have K=5 which is the number of edges in a circle motif. But you are right, it is more appreciate to just call it Precision since crane and house motifs have 6 edges which is not exactly 5, or change num_gd to 5. Either way, we keep the evaluation the same for all the baselines so the conclusion should be the same. I will correct it in the code.

Thanks for pointing that out!

Thank you very much for your reply! I believe what you originally did is actually calculating Recall not the Precision.

Note that we set number of causal edges predicted by DIR the same as the number of ground truth, thus *FN = FP.

Example:

ground truth mask = [0, 0, 1, 1, 1]

predicted mask = [1, 1 ,1, 0, 0]

TP = 1
FN = FP = 2
TN = 0

precision = recall = 1 / 3

Yes. Precision = recall holds if and only if we set the number of causal edges (Top K) predicted by DIR the same as the number of ground truth. But the number of ground truths varies for each sample, so you can not call it Precision since crane and house motifs have 6 edges, which is not exactly 5, i.e.,

the number of idx = indices_for_sort[:num_gd].detach().cpu().numpy()

varies which instead should be the same $K$ for every prediction lists for Precision calculation). The better way is to change num_gd in both prediction and ground truth lists to 5 if you want to call it Precision.

I see these are two individual problems that you mentioned.

Varying number of ground truth doesn’t affect the fact that it’s the average Precision based on the definition (as long as #gd=#selected Recall or Precision doesn’t matter so I don’t see it as a problem), but does affect that it’s being called as Precision @ K.

In conclusion, the argument is not between Recall and Precision but between Precision and Precision @ K. And the correct definition is Precision imo. Does it make sense to you?

Yes. It is just not Precision@5 in your paper.