Why transpose the adjacent matrix for cora and citeseer dataset?

Question

Why transpose the adjacent matrix for cora and citeseer dataset?

zhoushengisnoob opened this issue 5 years ago · comments

Dear authors:
Thanks for your open-sourced code and this paper is great!
In section 3.3.1, the GCN encoder learn features from previous layer of the neighbors to which it points. However, in the code, for Citeseer and Cora dataset, the adjacent matrix is transposed but in Google dataset, it is not transposed. As far as I know, this will change the way it learns information from (in-neighbor or out-neighbor).
Is there any motivation behind this? Thanks for your explaination!
Best
A fan of this paper:)

G. Salha-Galvan · Answer 1 · Sat Dec 28 2019 21:37:57 GMT+0800 (China Standard Time)

Dear @zhoushengisnoob,

First of all, thank you very much for your feedback!

In our experiments, we used the raw Cora and Citeseer graph datasets from LINQS. Their format is <ID of cited paper> <ID of citing paper>. In other words, each row of the data file corresponds to an edge, the first entry being the ID of the paper being cited while the second ID stands for the paper which contains the citation.

This format is reversed w.r.t. the usual edgelist format, which is <ID of citing paper> <ID of cited paper>. As a consequence, nx.read_edgelist() returns wrong edges directions for Cora and Citeseer. Transposing the resulting adjacency matrix is a simple way to retrieve the actual graph/adjacency matrix, with correct edges directions.

The Google dataset is directly provided in the standard edgelist format, so we did not need to transpose the adjacency matrix for this graph.

zhousheng · Answer 2 · Sat Dec 28 2019 23:51:58 GMT+0800 (China Standard Time)

Thanks for your reply which helps me a lot!