qpp query-performance-prediction information-retrieval query-reformulation

IRQPP

The main script is generate_results.py
Usage: python3.6 generate_results.py -h
Before running the code you must ensure having the following files:

Results Files

~/QppUqvProj/Results/{corpus}/test/{basic/raw/fused}/QL.res
~/QppUqvProj/Results/{corpus}/test/{basic/raw/fused}/logqlc.res

Parameters Files

~/QppUqvProj/Results/{corpus}/test/indriRunQF.xml
~/QppUqvProj/Results/{corpus}/test/indriRunQL.xml
~/QppUqvProj/Results/{corpus}/test/clarityParam.xml

AP Results Files

~/QppUqvProj/Results/{corpus}/test/aggregated/map1000-{agg function}
~/QppUqvProj/Results/{corpus}/test/single/map1000-{single selection function}

Cross Validation files

~/QppUqvProj/Results/{corpus}/test/2_folds_30_repetitions.json

LogQLC stands for log QL of the query with the entire corpus

In general the code assumes the directories structure is as seen in the file FS-structure.pdf

To create the files using indri:
Create QL.res example for ROBUST UQV will create QL scores retrieved results list:
indri-5.6/runqueryql/IndriRunQueryQL QppUqvProj/Results/ROBUST/test/indriRunQL.xml -threads=8 QppUqvProj/data/ROBUST/fullqueriesUQV.xml > QppUqvProj/Results/ROBUST/test/raw/QL.res
Create logqlc.res example:
indri-5.6/logqlc/LogQlC QppUqvProj/Results/ROBUST/test/indriRunQL.xml QppUqvProj/data/ROBUST/fullqueriesUQV.xml > QppUqvProj/Results/ROBUST/test/raw/logqlc.res
Create QLmap1000 (raw ap scores) example:
trec_eval -qn -m map QppUqvProj/data/ROBUST/qrelsUQV QppUqvProj/Results/ROBUST/test/raw/QL.res | awk '{print $2, $3}' > QppUqvProj/Results/ROBUST/test/raw/QLmap1000 Create map1000-max file example (for single pick): python3.6 repos/IRQPP/singleUQV.py QppUqvProj/Results/ROBUST/test/raw/QLmap1000 QppUqvProj/Results/ROBUST/test/raw/QLmap1000 -f max > QppUqvProj/Results/ROBUST/test/single/map1000-max Create fused (CombSum) results file:
python3.7 repos/IRQPP/fusion.py QppUqvProj/Results/ROBUST/test/raw/QL.res QppUqvProj/Results/ROBUST/test/raw/logqlc.res > QppUqvProj/Results/ROBUST/test/fusion/QL.res

Help Files

Some of the help instructions still need to be updated.

usage: python3.6 generate_results.py --predictor PREDICTOR -c CORPUS -q QUERIES 

Full Results Pipeline Automation Generator

optional arguments:
  -h, --help            show this help message and exit
  --predictor predictor_name
                        predictor to run
  -q queries.xml, --queries queries.xml
                        path to queries xml res
  -c {ROBUST,ClueWeb12B}, --corpus {ROBUST,ClueWeb12B}
                        corpus (index) to work with
  --qtype {basic,single,aggregated,fusion}
                        The type of queries to run
  -m {pearson,spearman,kendall}, --measure {pearson,spearman,kendall}
                        default correlation measure type is pearson
  -t {basic,single,aggregated,fusion,all}, --table {basic,single,aggregated,fusion,all}
                        the LaTeX table to be printed
  --generate            generate new predictions
  --lists               generate new lists
  --calc                calc new UQV predictions

Currently Beta Version

About

The result of the work in this repo was published in the SIGIR 2019 paper: Information Needs, Queries, and Query Performance Prediction

https://dl.acm.org/doi/10.1145/3331184.3331253

qpp query-performance-prediction information-retrieval query-reformulation

MIT License

Languages

Language:Python 75.0%Language:C++ 23.8%Language:Shell 1.2%