THUKElab / UltraWiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🏝️ UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

πŸ”¬ Dependencies

pip install -r requirements.txt

Details

  • python==3.11
  • pytorch==2.2.1
  • transformers==4.38.2
  • openai==0.27.8

πŸ“š Dataset(UntraWiki)

UntraWiki
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ query
β”‚   β”‚   β”œβ”€β”€ cls_1.json
β”‚   β”‚   β”œβ”€β”€ cls_2.json
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── cls_T.json
β”‚   β”‚   
β”‚   β”œβ”€β”€ ent2sents.json
β”‚   β”œβ”€β”€ ent2text.json
β”‚   └── entities.txt
β”‚
β”œβ”€β”€ GenExpan
β”‚
β”œβ”€β”€ src
β”‚   β”œβ”€β”€ dataset_for_cl.py
β”‚   β”œβ”€β”€ dataset_for_ent_predict.py
β”‚   β”œβ”€β”€ expand.py
β”‚   β”œβ”€β”€ inferencer.py
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ make_cln2groups.py
β”‚   β”œβ”€β”€ make_ent2ids.py
β”‚   β”œβ”€β”€ model.py
β”‚   β”œβ”€β”€ train_mlm.py
β”‚   └── utils.py
β”‚
β”œβ”€β”€ appendix.pdf
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ run_base.sh
β”œβ”€β”€ run_cl.sh
└── run_ra.sh

πŸš€ Train and Evaluate


  • run_base.sh run_cl.sh and run_ra.sh are respectively the running scripts for three methods: RetExpan, RetExpan with Ultra-fine-grained Contrastive Learning, and RetExpan with Entity-based Retrieval Augmentation. Their corresponding relationships are shown in the following table:
Script Name Method
run_base.sh RetExpan
run_cl.sh RetExpan with Ultra-fine-grained Contrastive Learning
run_ra.sh RetExpan with Entity-based Retrieval Augmentation
  • We use 8 RTX 3090 GPUs with 24GB of VRAM each for training and inference. In the run*.sh script, we set the GPU usage through gpu_group="0,1,2,3,4,5,6,7".

  • If you want to expand entities with RetExpan, run this:

bash run_base.sh

The expand results will be saved in ./data/expand_results_base .

  • If you want to expand entities with RetExpan with Ultra-fine-grained Contrastive Learning, run this:
bash run_cl.sh

The expand results will be saved in ./data/expand_results_cl2 .

  • If you want to expand entities with RetExpan with Entity-based Retrieval Augmentation, run this:
bash run_ra.sh

The expand results will be saved in ./data/expand_results_ra .

πŸ’‘ Acknowledgement

  • We appreciate ProbExpan , MESED and many other related works for their open-source contributions.

About


Languages

Language:Python 95.0%Language:Shell 5.0%