Example Selection for In-Context Learning

Framework for convenient In-context Learning (ICL) evaluations for different datasets, LLMs, and example selection methods. In particular, it is used to evaluate the in-context example selection methods proposed in the following papers:

Coverage-based Example Selection for In-Context Learning - BERTScore-Recall (BSR), Set-BSR. Originally implemented in the icl-coverage repository.
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks - GistScore, Set-GistScore. See also the gist-icl repository.

Apart from the above, it also supports the following selectors: Random, BM25, SentenceBERT (Cosine). See constants for a list of datasets and LLMs that have currently been evaluated.

Setup

Download datasets unavailable in HuggingFace from here and store them in data/.
Install Python 3.10.
Install Python dependencies: pip install -r requirements.txt

Some third-party repos:

qdecomp_with_dependency_graphs: required for DROP dataset.

mkdir icl-demo-selection/src/third_party
git clone git@github.com:matanhasson/qdecomp_with_dependency_graphs.git icl-demo-selection/src/third_party/

[Optional] LLM-specific setup:
1. For experiments with LlaMA models, set the path to the directory containing downloaded LlaMA weights in langchain.llms.huggingface.get_model_cache_dir.
2. Experiments with some LLMs may require setting up HuggingFace auth token by running huggingface-cli login.
3. Store the OpenAI key in openai_keys.txt in the root directory.

Organization

The repository is organized as follows:

icl
├── data             (local datasets -- download from https://1drv.ms/u/s!AqJNiE6C-nXuoawBxh-3rfUsSf4-8A?e=3o1YDK)
├── results          (icl experiment results and logs)
├── src              (relevant source files described below)
└── openai_keys.txt    (any openai keys, one per line)

Important source files include:

src/params.py defines experiment parameters
- src/data_params.py defines the parameters for each dataset
src/constants.py defines some useful enums and constants
src/driver.py is the main file to run a single ICL experiment. Instead of directly running this file, use src/experiments.py -- it takes care of many default parameters and makes it easy to run multiple experiments.
- src/eval.py used within src/driver.py to run the ICL evaluation
src/experiments.py contains the code to run experiments, track experiment statuses and aggregate results. Instead, of directly it dumps the parameters for all the experiments to a file that is then used by src/run.py. Run python experiments.py --help to see help.
- src/exp_utils.py defines various default arguments
src/run.py used to run one or more experiments sequentially or in parallel on one or more GPUs. It is the main file to run experiments.
src/selector/ contains the implementations for the various selectors
src/prompts/ contains templates for single examples and few-shot prompts

Workflows

Running ICL Evaluations

src/experiments.py and src/run.py are the main files to run ICL evaluations. The following are some example workflows:

Generate the parameters for 8-shot ICL with all the datasets, Neo and LLaMA-7B LLMs, with LLMs selected using Cosine, BERTscore, and GistScore selectors, and dump them to params/all.jsonl. See experiments.main for detailed usage.

python experiments.py --label "test" --seeds 0 \
--datasets "QNLI;MNLI;RTE;SST2;YELP;MRPC;QQP;PAWS;COPA;PIQA;WINOGRANDE;WSC;CMSQA;COLA;COMMONGEN;E2ENLG;DART;SST5;AGNEWS;AESLC;SMCALFLOW_CS;BREAK;MTOP;COGS" \
--selectors "cosine;bertscore;gist_bertscore" \
--lms "llama-7B" \
--n-shots 8 --baselines-exp \
--paramsfile "params/all.jsonl" --run \
--no-collate-results \
--preview "logfiles"

Run the experiments in params/all.jsonl parallelly on gpus 0 and 1.
```
python run.py --paramsfile "params/all.jsonl" --gpus "0,1"
```

NOTE: To run ICL evaluations with GistScore, see the gist-icl repo.

Adding a new dataset

Update constants.Dataset and constants.category2datasets.
Add a parameters class for it in src/data_params.py similar to all the other datasets.
1. If it requires a new metric, add it to prompts/base.py
2. Test it using data_params.test_dataset or data_params.test.
For ICL evaluation, some of these might also be necessary (though rare):
1. If it requires any default arguments, add them to exp_utils.dataset_args_d
2. It has more than one splits, add them to exp_utils.ds2splits. If it has more than one test_splits, those will be recorded in exp_utils.dataset_args_d (similar to COGS).
3. If it requires a new metric, add the name for that metric to the metric_cols lists in experiments.make_tables.

Miscellaneous Tips

There are two different types of command lines in this repository:

Typer - this one is used for non-nested parameterization. Allows multiple commands in a single script run as python <script> <command> <arguments>. The <command> only needs to be specified if there are more than one commands (eg. src/data_params.py). The <arguments> are specified a bit differently so try running with --help to see them.
1. src/experiments.py:
2. src/run.py
3. src/data_params.py
Hydra - this one is used for more nested parameterization.
1. src/driver.py: parameters defined in (src/params.py:AllParams)

Citation

If you found this repository useful, please cite the following papers:

@inproceedings{gupta-etal-2023-coverage,
    title = "Coverage-based Example Selection for In-Context Learning",
    author = "Gupta, Shivanshu  and
      Gardner, Matt  and
      Singh, Sameer",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.930",
    doi = "10.18653/v1/2023.findings-emnlp.930",
    pages = "13924--13950",
}
@article{gupta2023gistscore,
   title={GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks},
   author={Shivanshu Gupta and Clemens Rosenbaum and Ethan R. Elenberg},
   year={2023},
   eprint={2311.09606},
   archivePrefix={arXiv},
   primaryClass={cs.CL}
}

Shivanshu-Gupta / in-context-learning