cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New dataset - graph labels

feandalo opened this issue · comments

Hello!

I implemented my own dataset class, and it worked quite well.
However, when I try to add graph labels, i.e., populate 'y' in the data loader with one-hot encoded vectors, I get a mismatch in size in the forward function of the transformer model, when computing self.mlp_out_y.

The problem is not in the data loader itself: the batches are generated correctly. But when I print the tensor shapes in the forward method, 'X' and 'E' have the correct shape but not 'y'. From the second batch, 'y' is just one dimension and is filled with 0s. The first batch seems to be correct.

I thought it was a problem with my implementation, but I tried to change the size of 'y' in the spectre dataset ( y = torch.zeros([1, 2]).float()) in __getitem__(self, idx), for example, and the same problem occured. The first batch is ok, and for the second batch, the shape of 'y' is incorrect in the forward method.

So my two questions are:

  1. Is it possible to train DiGress with categorical graph labels?
  2. If yes, do you know how to overcome this problem?

Thank you! :)

Hello, you need to change the function compute_input_output_dims in datasets/abstract_dataset.py. Line 137, the dimension of y is currently 0, you can adapt it to your setting.

Thanks, @cvignac !
I have a related question if you could help me again :)

Is there an easy way to assign two sets of attributes to the nodes? I.e., each node would have two one-hot encoded vectors, representing different attributes.

Thanks!

Hello @feandalo,
It is possible yes -- it's quite a lot of lines of code, but nothing difficult to implement. You can check in the code of MiDi (https://github.com/cvignac/midi) how we handle it. There, we use two set of attributes on the atoms: the atom types, and the formal charges.