Using the embeddings from LLM and GCN to perform node classification, and the accuracy is much lower than reported in the paper.

Question

Using the embeddings from LLM and GCN to perform node classification, and the accuracy is much lower than reported in the paper.

QiaoYRan opened this issue 7 months ago · comments

First of all, thanks for your inspiring work!

In Table 3, node classification using LLM-encoded input features and GCN appears to achieve an accuracy exceeding 81%. However, when I tried to replicate this result (using SentenceEncoder/LlaMa2-7b/4-layer GCN), I only achieved an accuracy around 30%. Could there be some important details that I overlooked?

Jerry Kong · Answer 1 · Fri Jan 26 2024 07:45:30 GMT+0800 (China Standard Time)

Hi @QiaoYRan , thank you for your interest in our work. We got the GCN/GAT results from the GAT paper, and the setting is original node features + GNN. Meanwhile, we were also not able to fully reproduce their results either, and we included their results respecting their work. However, a result near 30% seems exceptionally low. These node classification datasets are particularly difficult to train due to small train size. I recommend take a look at #6 , which is somewhat relevant. Also, if you can share your training details, I am happy to help and see how to better reproduce.

Qiao Yiran · Answer 2 · Fri Jan 26 2024 11:55:49 GMT+0800 (China Standard Time)

Hi @QiaoYRan , thank you for your interest in our work. We got the GCN/GAT results from the GAT paper, and the setting is original node features + GNN. Meanwhile, we were also not able to fully reproduce their results either, and we included their results respecting their work. However, a result near 30% seems exceptionally low. These node classification datasets are particularly difficult to train due to small train size. I recommend take a look at #6 , which is somewhat relevant. Also, if you can share your training details, I am happy to help and see how to better reproduce.

Thank you for your reply. I follow the experiment setting in this repo(https://github.com/XiaoxinHe/TAPE?tab=readme-ov-file), just changing the features into LLM embeddings and setting lr to 0.002. So is there any chance that you have tried with basic GNNs using LLM embeddings only, without additional designs? I wonder if the weak performance comes from my training or LLM embedding itself?

Jerry Kong · Answer 3 · Fri Jan 26 2024 20:34:21 GMT+0800 (China Standard Time)

I think LLM features are not the issue, assuming the preprocessing and raw text in the repo you mentioned are correct, since OFA uses LLM features and gets reasonably well performance. Then, I would suspect that something went wrong during training. Can you take a look and see if your training accuracy reaches 100% (that's usually what happens when you train GNN for this task). The number epoch (given that each epoch loop over all training node once/without learning rate decay) should roughly be something between 100 to 200 for convergence. Then, following this setup, my experience is that you should get at least ~60% without any regularization, and you can then add in regularization (dropout/weight decay) to get ~75% performance.

Qiao Yiran · Answer 4 · Mon Jan 29 2024 12:43:37 GMT+0800 (China Standard Time)

I think LLM features are not the issue, assuming the preprocessing and raw text in the repo you mentioned are correct, since OFA uses LLM features and gets reasonably well performance. Then, I would suspect that something went wrong during training. Can you take a look and see if your training accuracy reaches 100% (that's usually what happens when you train GNN for this task). The number epoch (given that each epoch loop over all training node once/without learning rate decay) should roughly be something between 100 to 200 for convergence. Then, following this setup, my experience is that you should get at least ~60% without any regularization, and you can then add in regularization (dropout/weight decay) to get ~75% performance.

Thank you very much! I will try again.