THUDM / paper-source-trace

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

paper-source-trace

Prerequisites

  • Linux
  • Python 3.9
  • PyTorch 1.10.0+cu111

Getting Started

Installation

Clone this repo.

git clone https://github.com/THUDM/paper-source-trace.git
cd paper-source-trace

Please install dependencies by

pip install -r requirements.txt

PST Dataset

The dataset can be downloaded from BaiduPan with password bft3, Aliyun or DropBox. The paper XML files are generated by Grobid APIs from paper pdfs.

Run Baselines for KDD Cup 2024

First, download DBLP dataset from AMiner. Put the unzipped PST directory into data/ and unzipped DBLP dataset into data/PST/.

cd $project_path
export CUDA_VISIBLE_DEVICES='?'  # specify which GPU(s) to be used
export PYTHONPATH="`pwd`:$PYTHONPATH"

# Method 1: Random Forest
python rf/process_kddcup_data.py
python rf/model_rf.py  # output at out/kddcup/rf/

# Method 2: Network Embedding
python net_emb.py  # output at out/kddcup/prone/

# Method 3: SciBERT
python bert.py  # output at out/kddcup/scibert/

Results on Valiation Set

Method MAP
Random Forest 0.21420
ProNE 0.21668
SciBERT 0.29489

Citation

If you find this repo useful in your research, please cite the following papers:

@article{zhang2024pst,
  title={PST-Bench: Tracing and Benchmarking the Source of Publications},
  author={Fanjin Zhang and Kun Cao and Yukuo Cen and Jifan Yu and Da Yin and Jie Tang},
  journal={arXiv preprint arXiv:2402.16009},
  year={2024}
}

@article{zhang2024oag,
    title={OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining},
    author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
    journal={arXiv preprint arXiv:2402.15810},
    year={2024}
}

About


Languages

Language:Python 99.5%Language:Shell 0.5%