Batch computation of Difformer

Question

Batch computation of Difformer

tongnie opened this issue a year ago · comments

Hi! Thanks for demonstrating such an interesting work and sharing your code! I'm very interested in Difformer and I'd like to conduct spatial-temporal prediction tasks based on it. However, in practice, the input data structure for spatial-temporal datasets might be [batch_size, sequence_length, number_of_nodes, feature_dimension], where each mini-batch is split along the sequence dimension, rather than node dimension in typical GNN problems. So is Difformer applicable in this case? Or what possible adjustments might be needed based on your implementations?

Looking forward to your reply and I appreciate it greatly!

Qitian Wu · Answer 1 · Thu Apr 20 2023 22:59:41 GMT+0800 (China Standard Time)

Hi Tong, our current experiments for the spatial-temporal case only use the previous graph snapshot as input for predicting the next state and we feed each graph snapshot at one time into the model feedforward. So, in our model implementation (the difformer.py), the input data has dimension [number of nodes, feature_dimension] which is the same as the other two tasks.

For the cases you mentioned, if the input needs to be [batch_size, sequence_length, number_of_nodes, feature_dimension], I think the simplest way you can do is to add two new dimensions to the input x of the DIFFormer class, and treat the first two dimensions independently. In such a way, the all-pair attentions are also applied along the nodes dimension as well. Moreover, you could also apply the all-pair attention along the sequence dimension to accommodate the temporal dependence if needed. The full_attention_conv function in difformer.py is flexibile for arbitrary query/key/value inputs depending the dimension you target for computing the attention.

Hope this illustration will be helpful!

Tong Nie · Answer 2 · Fri Apr 21 2023 08:24:53 GMT+0800 (China Standard Time)

Thanks for your kind suggestion! It is enlightening and I'll give it a try.

rye · Answer 3 · Wed May 10 2023 20:44:40 GMT+0800 (China Standard Time)

Thanks for your kind suggestion! It is enlightening and I'll give it a try.

I try to reduce the feature dimension to 1, only the first feature is selected, then the shape of x is [207,12], the code can run, but the speed is very slow, I don't know what the problem is?

Qitian Wu · Answer 4 · Wed May 10 2023 22:39:59 GMT+0800 (China Standard Time)

Can you guys tell me more details about the dataset you try. What is the input graph, the edge sparsity and the label? And what is the input you used for the model, e.g., one instance with the shape [207, 12] or a batch of instances with that shape? Is the 207 for node number and 12 for input feature dimension? And, what is the dimension along which the diffusion attention is computed?