adalin16 / ICL_EE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Impact of Sample Selection on In-Context Learning for Entity Extraction from Scientific Writing

This repository provides the system used in our work for in-context learning (ICL) sample selection methods for scientific entity extraction task.

Installation

Install the required libraries

pip install easyinstruct -i https://pypi.org/simple
pip install --upgrade openai
pip install transformers
pip install datasets

Datasets

We use five scientific entity extraction datasets.

  • ADE
  • MeasEval
  • SciERC
  • STEM-ECR
  • WLPC

Results

Main Results

Method ADE MeasEval SciERC STEM-ECR WLPC
Baseline Models
RoBERTa 90.42 56.68 68.52 69.70 28.36
Zero-shot 71.29 19.65 17.86 28.89 31.64
Random 74.56 22.49 29.27 26.85 32.20
In-context sample selecting methods
KATE 83.11 22.75 29.97 30.78 45.02
Perplexity 79.13 21.43 31.31 26.57 30.46
BM25 77.28 24.72 35.96 25.61 44.14
Influence 86.35 27.13 36.47 27.81 45.41

Low-Resource Scenario

Method ADE MeasEval SciERC STEM-ECR WLPC
RoBERTa full 90.42 56.68 68.52 69.70 28.36
Baseline Models
RoBERTa %1 14.32 19.20 10.16 15.42 10.37
Zero-shot 71.29 19.65 17.86 28.89 31.64
Random %1 66.53 21.32 25.31 21.38 28.46
In-context sample selecting methods
KATE %1 69.06 24.48 26.78 26.49 28.97
Perplexity 68.83 22.23 26.42 25.48 26.05
BM25 %1 72.66 23.39 31.33 24.24 36.73
Influence %1 73.68 24.21 32.49 25.01 34.24

Running Experiments

Sample Selection

python icl_sample.py \
    --data \
    --metric \
    --embed \
    --model \
    --trained \
    --reversed \
    --train_file \
    --test_file

Evaluation

python icl_evaluate.py \
    --data --metric \
    --icl_file_name \
    --model \
    --train_file \ 
    --test_file

About


Languages

Language:Python 100.0%