Impact of Sample Selection on In-Context Learning for Entity Extraction from Scientific Writing

This repository provides the system used in our work for in-context learning (ICL) sample selection methods for scientific entity extraction task.

Installation

Install the required libraries

pip install easyinstruct -i https://pypi.org/simple
pip install --upgrade openai
pip install transformers
pip install datasets

We use five scientific entity extraction datasets.

Method	ADE	MeasEval	SciERC	STEM-ECR	WLPC
Baseline Models
RoBERTa	90.42	56.68	68.52	69.70	28.36
Zero-shot	71.29	19.65	17.86	28.89	31.64
Random	74.56	22.49	29.27	26.85	32.20
In-context sample selecting methods
KATE	83.11	22.75	29.97	30.78	45.02
Perplexity	79.13	21.43	31.31	26.57	30.46
BM25	77.28	24.72	35.96	25.61	44.14
Influence	86.35	27.13	36.47	27.81	45.41

Method	ADE	MeasEval	SciERC	STEM-ECR	WLPC
RoBERTa full	90.42	56.68	68.52	69.70	28.36
Baseline Models
RoBERTa %1	14.32	19.20	10.16	15.42	10.37
Zero-shot	71.29	19.65	17.86	28.89	31.64
Random %1	66.53	21.32	25.31	21.38	28.46
In-context sample selecting methods
KATE %1	69.06	24.48	26.78	26.49	28.97
Perplexity	68.83	22.23	26.42	25.48	26.05
BM25 %1	72.66	23.39	31.33	24.24	36.73
Influence %1	73.68	24.21	32.49	25.01	34.24

python icl_sample.py \
    --data \
    --metric \
    --embed \
    --model \
    --trained \
    --reversed \
    --train_file \
    --test_file

python icl_evaluate.py \
    --data --metric \
    --icl_file_name \
    --model \
    --train_file \ 
    --test_file