pip install -r requirements.txt
- python==3.11
- pytorch==2.2.1
- transformers==4.38.2
- openai==0.27.8
-
download query from https://cloud.tsinghua.edu.cn/d/811f767164994c268679/ and put them into "./data"
-
File hierarchy
UntraWiki
βββ data
β βββ query
β β βββ cls_1.json
β β βββ cls_2.json
β β βββ ...
β β βββ cls_T.json
β β
β βββ ent2sents.json
β βββ ent2text.json
β βββ entities.txt
β
βββ GenExpan
β
βββ src
β βββ dataset_for_cl.py
β βββ dataset_for_ent_predict.py
β βββ expand.py
β βββ inferencer.py
β βββ main.py
β βββ make_cln2groups.py
β βββ make_ent2ids.py
β βββ model.py
β βββ train_mlm.py
β βββ utils.py
β
βββ appendix.pdf
βββ README.md
βββ requirements.txt
βββ run_base.sh
βββ run_cl.sh
βββ run_ra.sh
run_base.sh
run_cl.sh
andrun_ra.sh
are respectively the running scripts for three methods: RetExpan, RetExpan with Ultra-fine-grained Contrastive Learning, and RetExpan with Entity-based Retrieval Augmentation. Their corresponding relationships are shown in the following table:
Script Name | Method |
---|---|
run_base.sh | RetExpan |
run_cl.sh | RetExpan with Ultra-fine-grained Contrastive Learning |
run_ra.sh | RetExpan with Entity-based Retrieval Augmentation |
-
We use 8 RTX 3090 GPUs with 24GB of VRAM each for training and inference. In the
run*.sh
script, we set the GPU usage throughgpu_group="0,1,2,3,4,5,6,7"
. -
If you want to expand entities with RetExpan, run this:
bash run_base.sh
The expand results will be saved in ./data/expand_results_base
.
- If you want to expand entities with RetExpan with Ultra-fine-grained Contrastive Learning, run this:
bash run_cl.sh
The expand results will be saved in ./data/expand_results_cl2
.
- If you want to expand entities with RetExpan with Entity-based Retrieval Augmentation, run this:
bash run_ra.sh
The expand results will be saved in ./data/expand_results_ra
.