Do you know how to write my own dataset or use TUDataset to process it into a trainable data format?

Question

Do you know how to write my own dataset or use TUDataset to process it into a trainable data format?

Shengyuan-Cai opened this issue 3 years ago · comments

Hello blogger, I have made a preprocessing data set by myself and want to use TG for training. Do you know how to write your own dataset or use TUDataset to process it into a trainable data format? (72 graphs, 63 nodes each, with 2300 features for each node that is not hot-coded)>>>

Christopher Morris · Answer 1 · Thu May 13 2021 03:43:44 GMT+0800 (China Standard Time)

What is TG?

Shengyuan Cai · Answer 2 · Thu May 13 2021 04:31:01 GMT+0800 (China Standard Time)

What is TG?

torch_geometric

Christopher Morris · Answer 3 · Thu May 13 2021 05:14:26 GMT+0800 (China Standard Time)

Have a look at the TG documentation.

Shengyuan Cai · Answer 4 · Thu May 13 2021 10:54:42 GMT+0800 (China Standard Time)

Have a look at the TG documentation.

This is the demo from the torch_geometric documentation,but I don;t know the details about the method,could you give me some tips?

import torch
from torch_geometric.data import InMemoryDataset, download_url

class MyOwnDataset(InMemoryDataset):
def init(self, root, transform=None, pre_transform=None):
super(MyOwnDataset, self).init(root, transform, pre_transform)
self.data, self.slices = torch.load(self.processed_paths[0])

  @property
  def raw_file_names(self):
      return ['some_file_1', 'some_file_2', ...]

  @property
  def processed_file_names(self):
      return ['data.pt']

  def download(self):
      # Download to `self.raw_dir`.
      download_url(url, self.raw_dir)
      ...

  def process(self):
      # Read data into huge `Data` list.
      data_list = [...]

      if self.pre_filter is not None:
          data_list = [data for data in data_list if self.pre_filter(data)]

      if self.pre_transform is not None:
          data_list = [self.pre_transform(data) for data in data_list]

      data, slices = self.collate(data_list)
      torch.save((data, slices), self.processed_paths[0])

Christopher Morris · Answer 5 · Thu May 13 2021 11:21:02 GMT+0800 (China Standard Time)

Have a look at https://github.com/chrsmrrs/sparsewl/blob/master/neural_higher_order/ZINC/gnn_1_10K.py.

This is not a support forum for TG.

Shengyuan Cai · Answer 6 · Thu May 13 2021 17:00:08 GMT+0800 (China Standard Time)

Have a look at https://github.com/chrsmrrs/sparsewl/blob/master/neural_higher_order/ZINC/gnn_1_10K.py.

This is not a support forum for TG.
Thank you for your answer, but I sincerely hope you can help me again, thank you very much!

Analogous to the TUDataset data set, I have now understood and made the original format consistent with TUDataset, how should I write the class code ???(for processing the data set so that it can be pytorch_geometric)
I have prepared the raw data like this:

(1) XX_A.txt (m lines)
sparse (block diagonal) adjacency matrix for all graphs,
each line corresponds to (row, col) resp. (node_id, node_id)

(2) XX_graph_indicator.txt (n lines)
column vector of graph identifiers for all nodes of all graphs,
the value in the i-th line is the graph_id of the node with node_id i

(3) XX_graph_labels.txt (N lines)
class labels for all graphs in the dataset,
the value in the i-th line is the class label of the graph with graph_id i

(4) XX_node_attributes.txt (n lines)
matrix of node attributes,
the comma seperated values in the i-th line is the attribute vector of the node with node_id i