Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER

This repo provides the model, code & data of our paper: Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER (ACL 2022). [PDF]

Overview

Demonstration-based learning framework for NER integrates prompt into the input itself to make better input representations for token classification. Concatenating simple demonstration can be helpful to improve the performance.

Setup
Valid Combination Table
Running

3.1. Single run

3.2. Multiple runs

3.3. Running prompt Search

Setup

Optional Create and activate your conda/virtual environment
Run pip install -r requirements.txt
Optional Add support for CUDA. We have tested the repository on pytorch version 1.7.1 with CUDA version 10.1.

# conda
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch

# pip
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Important Locate your python libraries directory and replace the bert_score/score.py with score.py provided in this repository. We make some changes to cache the model and avoid reloading of model for each call. For example,

cp score.py ~/.conda/envs/<ENV_NAME>/lib/python3.6/site-packages/bert_score/score.py

Valid Combination Table

Prompt	Template	Description
`max`	`no_context`, `context`, `lexical`	Entity-oriented demonstration - Popular
`random`	`no_context`, `context`, `lexical`	Entity-oriented demonstration - Random
`sbert`	`context_all`, `lexical_all`	Instance-oriented demonstration - SBERT
`bertscore`	`context_all`, `lexical_all`	Instance-oriented demonstration - BERTSCORE

Running

Possible values for:

<DATASET> : conll, ontonotes_conll, bc5cdr
<PROMPT> : from the table above
<TEMPLATE> : from the table above
<SUFFIX> : 25, 50
<TRAIN_SEED> : 42, 1337, 2021
<SAMPLE_SEED> : 42, 1337, 2021, 5555, 9999
<CHECK_POINT> : Saved checkpoint

Single run

Execute a single run.

In-domain setting

scripts/in_domain/in_domain_one.sh <DATASET> <SHOT> <PROMPT> <TEMPLATE> <TRAIN_SEED> <SAMPLE_SEED>

Domain Adaptation setting

scripts/domain_adaptation/domain_adaptation_one.sh <DATASET> <SHOT> <PROMPT> <TEMPLATE> <TRAIN_SEED> <SAMPLE_SEED> <CHECK_POINT>

Multiple runs

This setting runs all 15 runs i.e. 5 different sub-samples x 3 training seeds

In-domain setting
```
scripts/in_domain/in_domain_all.sh
```
- remember to configure the parameters on top of this script.

Domain Adaptation setting

scripts/domain_adaptation/domain_adaptation_all.sh

Running prompt Search

Prompt	Template
`search`	`no_context`, `context`, `lexical`

search for best entities (based on only one seed)

python3 search.py \
    --dataset <DATASET> \
    --data_dir dataset/<DATASET> \
    --model_folder models/<DATASET>/conll_max_context \
    --device cuda:0 \
    --percent_filename_suffix <SEEDED_SUFFIX> \
    --template <TEMPLATE>

Run with best entities

python sampling_run.py \
    --train_file search_run.py \
    --dataset <DATASET> \
    --data_dir dataset/<DATASET> \
    --gpu 0 \
    --suffix <SUFFIX> \
    --template <TEMPLATE>

Citation

If you find our work helpful, please cite the following:

@InProceedings{lee2021fewner,
  author =  {Lee, Dong-Ho and Kadakia, Akshen and Tan, Kangmin and Agarwal, Mahak and Feng, Xinyu and Shibuya, Takashi and Mitani, Ryosuke and Sekiya, Toshiyuki and Pujara, Jay and Ren, Xiang},
  title =   {Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER},
  year =    {2022},  
  booktitle = {Association for Computational Linguistics (ACL)},  
}

INK-USC / fewNER