Cross-entropy in minibatching

Question

Cross-entropy in minibatching

najwalb opened this issue a year ago · comments

I have a question about your use of cross-entropy over nodes/edges when mini-batching graphs. If I understood your implementation correctly, to compute the loss for one minibatch, you compute the cross-entropy of a single node and a single edge in your minibatch of graphs. These cross-entropies are averaged over the entire minibatch, then combined via the following formula: $L_{ce} = L_{nodes} + \lambda L_{edges}$ (same as your equation 3).

To me $L_{ce}$ represents the loss for one graph, so I think you should first sum the losses for nodes and edges per graph, then take the mean of such sums over a minibatch. What do you think?

Clement Vignac · Answer 1 · Tue May 02 2023 15:19:00 GMT+0800 (China Standard Time)

Both options are possible, the one that you propose totally makes sense. We chose a sum over all nodes and edges because it gives more importance to the larger graphs, which are more difficult to model and generate correctly.

…

On 1 May 2023, at 12:17, najwalb ***@***.***> wrote: I have a question about your use of cross-entropy over nodes/edges when mini-batching graphs. If I understood your implementation <https://github.com/cvignac/DiGress/blob/68394c8480852ff39310ce5ec1d8195bfa80abdc/src/metrics/abstract_metrics.py#L89> correctly, to compute the loss for one minibatch, you compute the cross-entropy of a single node and a single edge in your minibatch of graphs. These cross-entropies are averaged over the entire minibatch, then combined via the following formula: $L_{ce} = L_{nodes} + \lambda L_{edges}$ (same as your equation 3). To me $L_{ce}$ represents the loss for one graph, so I think you should first sum the losses for nodes and edges per graph, then take the mean of such sums over a minibatch. What do you think? — Reply to this email directly, view it on GitHub <#39>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEJOOTVUCETU3IM5B5CU4P3XD6EUXANCNFSM6AAAAAAXRUHLYU>. You are receiving this because you are subscribed to this thread.

Najwa Laabid · Answer 2 · Tue May 02 2023 16:05:52 GMT+0800 (China Standard Time)

cool thanks for your reply. I am not sure I understand the intuition behind summing over all nodes and edges favoring bigger graphs. Is it because with my method bigger graphs will have a higher CE and would thus be penalized more?

Clement Vignac · Answer 3 · Tue May 02 2023 16:09:55 GMT+0800 (China Standard Time)

If you compute the mean over the CE of the nodes and the edges for each graph, and then sum over the batch, each graph will be given the same importance. If you compute a sum over the nodes and the edges rather than a mean, the the batch loss will contain more contributions from big graphs than contributions from small graphs. That’s why I believe that it puts more emphasis on big graphs.

…

On 2 May 2023, at 10:06, najwalb ***@***.***> wrote: cool thanks for your reply. I am not sure I understand the intuition behind summing over all nodes and edges favoring bigger graphs. Is it because with my method bigger graphs will have a higher CE and would thus be penalized more? — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEJOOTREAOKKTLDKC44IMBDXEC563ANCNFSM6AAAAAAXRUHLYU>. You are receiving this because you commented.

Najwa Laabid · Answer 4 · Tue May 02 2023 16:18:00 GMT+0800 (China Standard Time)

I am actually proposing a sum over the nodes and edges for one graph, then taking the mean over the batch. So the loss per batch will be: $L = (\frac{1}{N}) \sum^{i<=N} (\sum_n CE_{n,i} + \lambda \sum_{nm} CE_{nm,i})$ with $n, m$ vertices in the graph $i$. This way big graphs will still have a large loss and will contribute to the batch loss more no?

Clement Vignac · Answer 5 · Wed Jun 21 2023 23:30:59 GMT+0800 (China Standard Time)

Ok, I see your point. Did you try it? Does it work better?