seanmacavaney / autoqrels

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

autoqrels

autoqrels is a tool for automatically inferring query relevance assessments (qrels).

Currently, it supports the one-shot labeling approach (1SL) presented in MacAvaney and Soldaini, One-Shot Labeling for Automatic Relevance Estimation, SIGIR 2023.

This package adheres to the ir-measures API, which means it can be directly used by various tools, such as PyTerrier.

Getting started

You can install autoqrels using pip:

pip install autoqrels

You can also work with the repository locally:

git clone https://github.com/seanmacavaney/autoqrels.git
cd autoqrels
python setup.py develop

API

The primary interface in autoqrels is autoqrels.Labeler. A Labeler exposes a method, infer_qrels(run, qrels), which returns a new set of qrels that covers the provided run:

  • run is a Pandas DataFrame with the columns query_id (str), doc_id (str), and score (float)
  • qrels is a Pandas DataFrame with the columns query_id (str), doc_id (str), and relevance (int)
  • The return value is a Pandas DataFrame with the columns query_id (str), doc_id (str), and relevance (float)

Labelers also expose several measure definitions compatible with ir_measures: labeler.SDCG@k, labeler.RBP(p=persistence), labeler.P@k. These measures can be used to calculate the corresponding effectivness, with the addition of the labeler's inferred qrels. See the ir-measures documentation for more details.

We'll now explore the available Labeler implementations.

autoqrels.oneshot: 1SL (One-shot Labeling)

Reproduction: See repro instructions in repro/oneshot.

One-shot labelers work over a single known relevant document per query. An error is raised if multiple relevant documents are provided.

Example:

import autoqrels
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
duot5 = autoqrels.oneshot.DuoT5(dataset=dataset, cache_path='data/duot5.cache.json.gz')
# measures:
duot5.SDCG@10
duot5.P@10
duot5.RBP

Citation

If you use this work, please cite:

@inproceedings{autoqrels,
  author = {MacAvaney, Sean and Soldaini, Luca},
  title = {One-Shot Labeling for Automatic Relevance Estimation},
  booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year = {2023},
  url = {https://arxiv.org/abs/2302.11266}
}

About

License:Apache License 2.0


Languages

Language:Python 91.7%Language:Shell 8.3%