Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs

UPDATE: The pt file for Citeseer has some problems. Please use the latest version citeseer2 instead of the version inside small_data.zip. We use Graph Cleaner Graph Cleaner to fix wrong labels.

This is the official code repository for our paper Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs

Followup-works:

LLM-GNN TSGFM

Introduction

Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predictions through GNNs. The latter attempts to directly employ LLMs as standalone predictors. We conduct comprehensive and systematical studies on these two pipelines under various settings. From comprehensive empirical results, we make original observations and find new insights that open new possibilities and suggest promising directions to leverage LLMs for learning on graphs.

We provide the implementation of the following pipelines.

LLMs-as-Predictors

Check ego_graph.py, and directly use ChatGPT to do zero-shot/few-shot predictions.

LLMs-as-Enhancers

Check baseline.py, various kinds of embedding-visible LLMs (like LLaMA, SentenceBERT, or text-ada-embedding-002) can be used to generate embeddings as node features.

(New project) LLMs-as-Annotators

Check out our new project here: Label-free Node Classification on Graphs with Large Language Models (LLMS)

Citation

@article{Chen2023ExploringTP,
  title={Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs},
  author={Zhikai Chen and Haitao Mao and Hang Li and Wei Jin and Haifang Wen and Xiaochi Wei and Shuaiqiang Wang and Dawei Yin and Wenqi Fan and Hui Liu and Jiliang Tang},
  journal={ArXiv},
  year={2023},
  volume={abs/2307.03393}
}

0. Environment Setup

Package Installation

Assume your cuda version is 11.8

conda create --name LLMGNN python=3.10
conda activate LLMGNN

conda install pytorch==2.0.0 cudatoolkit=11.8 -c pytorch
conda install -c pyg pytorch-sparse
conda install -c pyg pytorch-scatter
conda install -c pyg pytorch-cluster
conda install -c pyg pyg
pip install ogb
conda install -c dglteam/label/cu118 dgl
pip install transformers
pip install --upgrade accelerate
pip install openai
pip install langchain
pip install gensim
pip install google-generativeai
pip install -U sentence-transformers
pip install editdistance
pip install InstructorEmbedding
pip install optuna
pip install tiktoken
pip install pytorch_warmup

Dataset

We have provided the processed datasets via the following google drive link

To unzip the files, you need to

unzip the small_data.zip into preprocessed_data/new
If you want to use ogbn-products, unzip big_data.zip info preprocessed_data/new
Download and move *_explanation.pt and *_pl.pt into preprocessed_data/new. These files are related to TAPE.
unzip the ada.zip into ./
Move *_entity.pt into ./
Put ogb_arxiv.csv into ./preprocessed_data

Get ft and no-ft LM embeddings

Refer to the following scripts

for setting in "random"
do 
    for data in "cora" "pubmed"
    do
        WANDB_DISABLED=True CUDA_VISIBLE_DEVICES=3 python3 lmfinetune.py --dataset $data --split $setting --batch_size=9 --label_smoothing 0.3 --seed_num 5 
        WANDB_DISABLED=True CUDA_VISIBLE_DEVICES=3 python3 lmfinetune.py --dataset $data --split $setting --batch_size=9 --label_smoothing 0.3 --seed_num 5 --use_explanation 1
    done
done

Generate pt files for all data formats

Run

python3 generate_pyg_data.py

1. Experiments for LLM-as-Enhancers

For feature-level, LLM-as-Enhancers, you may replicate the experiments using files baseline.py and lmfinetune.py

For example, you may run param sweep with the following script

for model in "GCN" "GAT" "MLP"
do
    for data in "cora" "pubmed"
    do 
        for setting in "random"
        do 
        # Add more formats here
            for format in "ft"
            do 
                CUDA_VISIBLE_DEVICES=1 python3 baseline.py --model_name $model  --seed_num 5 --sweep_round 40  --mode sweep --dataset $data --split $setting --data_format $format
                echo "$model $data $setting $format done"
            done
        done
    done
done

Run with a specific group of hyperparameters

python3 baseline.py --data_format sbert --split random --dataset pubmed --lr 0.01 --seed_num 5

Feature ensemble, separate each ensemble format with ";"

CUDA_VISIBLE_DEVICES=1 python3 baseline.py --model_name GCN --num_split 1 --seed_num 5 --sweep_split 1 --sweep_round 5 --mode sweep --dataset pubmed --split random --ensemble_string sbert\;know_sep_sb\;ft\;pl\;know_exp_ft

Batch version for ogbn-products

CUDA_VISIBLE_DEVICES=7 python3 baseline.py --model_name SAGE --epochs 10 --num_split 1 --batchify 1  --dataset products --split fixed --data_format ft --normalize 1 --norm BatchNorm --mode main --lr 0.003 --dropout 0.5 --weight_decay 0 --hidden_dimension 256 --num_layers 3

To replicate the results for RevGAT (You need to first run once with the default features to generate the dgl data)

python dgl_main.py --data_root_dir ./dgldata \
--pretrain_path  ./preprocessed_data/new/arxiv_fixed_sbert.pt \
--use-norm --use-labels --n-label-iters=1 --no-attn-dst --edge-drop=0.3 --input-drop=0.25 --n-layers 2 --dropout 0.75 --n-hidden 256 --save kd --backbone rev --group 2 --mode teacher


python dgl_main.py --data_root_dir ./dgldata \
--pretrain_path  ./preprocessed_data/new/arxiv_fixed_sbert.pt \
--use-norm --use-labels --n-label-iters=1 --no-attn-dst --edge-drop=0.3 --input-drop=0.25 --n-layers 2 --dropout 0.75 --n-hidden 256 --save kd --backbone rev --group 2 --mode student --alpha 0.95 --temp 0.7

To replicate the results for SAGN and GLEM, you may check their repositories and put the processed pt file into their pipelines.

2. Experiments for LLM-as-Predictors

Just run

python3 ego_graph.py

3. (UPDATE) Further Experiments on OOD & Prompts

In two recent studies titled CAN LLMS EFFECTIVELY LEVERAGE GRAPH STRUCTURAL INFORMATION: WHEN AND WHY and Explanations as Features: LLM-Based Features for Text-Attributed Graphs, researchers probed a specific prompt tailored for the Arxiv dataset containing data from post-2023, data which ChatGPT's pre-training corpus doesn't cover. Notably, the results showed no decline in performance compared to the original dataset. This intriguing outcome prompts us to delve deeper into creating efficacious prompts across varied domains.

Out-of-distribution (OOD) generalization, commonly known as Graph OOD, is a fervent area of discussion. Recent benchmarks, such as GOOD, indicate that GNNs don't fare well during structural and feature shifts. We embarked on an experiment using the Arxiv dataset to assess the potential of LLMs-as-Predictors, leveraging a prompt from Explanations as Features: LLM-Based Features for Text-Attributed Graphs, which exhibited superior performance.

	All avg	Val	Test	Best baseline (test)
concept degree	73.91 ± 0.63	73.01	72.79	63.00
covariate degree	75.75 ± 3.6	70.23	68.21	59.08
concept time	74.29 ± 0.96	72.66	71.98	67.45
covariate time	72.69 ± 1.53	74.28	74.37	71.34

Concept-shift: Where P(Y|X) varies, yet its construct remains anchored to covariate-shift by adjusting the ratios in each domain.
Covariate-shift: While P(X) shifts, P(Y|X) remains consistent.

For the covariate shift, there are configurations of 10/1/1 environments (train/val/test), and for the concept shift, it's 3/1/1 (train/val/test). The term All avg represents the mean performance across all environments.

One discernible merit of using LLMs-as-Predictors is their heightened resilience to OOD shifts.

CurryTang / Graph-LLM