Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
This repo provides the model, code & data of our paper: Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER (ACL 2022). [PDF]
Overview
Demonstration-based learning framework for NER integrates prompt into the input itself to make better input representations for token classification. Concatenating simple demonstration can be helpful to improve the performance.
Table of contents
-
3.1. Single run
3.2. Multiple runs
Setup
-
Optional Create and activate your conda/virtual environment
-
Run
pip install -r requirements.txt
-
Optional Add support for CUDA. We have tested the repository on pytorch version 1.7.1 with CUDA version 10.1.
# conda
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
# pip
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
- Important Locate your python libraries directory and replace the
bert_score/score.py
withscore.py
provided in this repository. We make some changes to cache the model and avoid reloading of model for each call. For example,
cp score.py ~/.conda/envs/<ENV_NAME>/lib/python3.6/site-packages/bert_score/score.py
Valid Combination Table
Prompt | Template | Description |
---|---|---|
max |
no_context , context , lexical |
Entity-oriented demonstration - Popular |
random |
no_context , context , lexical |
Entity-oriented demonstration - Random |
sbert |
context_all , lexical_all |
Instance-oriented demonstration - SBERT |
bertscore |
context_all , lexical_all |
Instance-oriented demonstration - BERTSCORE |
Running
Possible values for:
<DATASET>
:conll
,ontonotes_conll
,bc5cdr
<PROMPT>
: from the table above<TEMPLATE>
: from the table above<SUFFIX>
: 25, 50<TRAIN_SEED>
: 42, 1337, 2021<SAMPLE_SEED>
: 42, 1337, 2021, 5555, 9999<CHECK_POINT>
: Saved checkpoint
Single run
Execute a single run.
-
In-domain setting
scripts/in_domain/in_domain_one.sh <DATASET> <SHOT> <PROMPT> <TEMPLATE> <TRAIN_SEED> <SAMPLE_SEED>
-
Domain Adaptation setting
scripts/domain_adaptation/domain_adaptation_one.sh <DATASET> <SHOT> <PROMPT> <TEMPLATE> <TRAIN_SEED> <SAMPLE_SEED> <CHECK_POINT>
Multiple runs
This setting runs all 15 runs i.e. 5 different sub-samples x 3 training seeds
-
In-domain setting
scripts/in_domain/in_domain_all.sh
- remember to configure the parameters on top of this script.
-
Domain Adaptation setting
scripts/domain_adaptation/domain_adaptation_all.sh
Running prompt Search
Prompt | Template |
---|---|
search |
no_context , context , lexical |
-
search for best entities (based on only one seed)
python3 search.py \ --dataset <DATASET> \ --data_dir dataset/<DATASET> \ --model_folder models/<DATASET>/conll_max_context \ --device cuda:0 \ --percent_filename_suffix <SEEDED_SUFFIX> \ --template <TEMPLATE>
-
Run with best entities
python sampling_run.py \ --train_file search_run.py \ --dataset <DATASET> \ --data_dir dataset/<DATASET> \ --gpu 0 \ --suffix <SUFFIX> \ --template <TEMPLATE>
Citation
If you find our work helpful, please cite the following:
@InProceedings{lee2021fewner,
author = {Lee, Dong-Ho and Kadakia, Akshen and Tan, Kangmin and Agarwal, Mahak and Feng, Xinyu and Shibuya, Takashi and Mitani, Ryosuke and Sekiya, Toshiyuki and Pujara, Jay and Ren, Xiang},
title = {Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER},
year = {2022},
booktitle = {Association for Computational Linguistics (ACL)},
}