Pyterrier
A Python API for Terrier
Installation
Linux
- Make sure that
JAVA_HOME
environment variable is set to the location of your Java installation pip install python-terrier
macOS
- You need to hava Java installed. Pyjnius/PyTerrier will pick up the location automatically.
pip install python-terrier
Windows
Pyterrier is not available for Windows because pytrec_eval isn't available for Windows. If you can compile & install pytrec_eval youself, it should work fine.
Colab notebooks
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
!pip install python-terrier
Indexing
Indexing TREC formatted collections
You can create an index from TREC formatted collection using TRECCollectionIndexer.
For TXT, PDF, Microsoft Word files, etc files you can use FilesIndexer.
For Pandas Dataframe you can use DFIndexer.
See examples at:
https://colab.research.google.com/drive/17WpzhtlMj1U2UJku-RaO2axNsUFhPI6z
Retrieval and Evaluation
topics = pt.Utils.parse_trec_topics_file(topicsFile)
qrels = pt.Utils.parse_qrels(qrelsFile)
BM25_br = pt.BatchRetrieve(index, "BM25")
res = BM25_br.transform(topics)
pt.Utils.evaluate(res, qrels, metrics = ['map'])
See examples at: https://colab.research.google.com/drive/1yime_0D21Q-KzFD4IbsRzTvjRbo9vz4I
Experiment - Perform Retrieval and Evaluation with a single function
We provide an experiment object, which allows to compare multiple retrieval approaches on the same queries & relevance assessments:
pt.Experiment(topics, [BM25_br, PL2_br], eval_metrics, qrels)
More examples are provided at: https://colab.research.google.com/drive/15oG7HwyYCBFuborjmfYglea0VLkUjyK-
Learning to Rank
First create a FeaturesBatchRetrieve(index, features)
object with the desired features.
Call the transform(topics_set)
function with the train, validation and test topic sets to get dataframes with the feature scores and use them to train your chosen model.
Use your trained model to predict the score of the test_topics and evaluate the result with pt.Utils.evaluate()
.
BM25_with_features_br = pt.BatchRetrieve(index, ["WMODEL:BM25F", "WMODEL:PL2F"], controls={"wmodel" : "BM25"})
LTR_pipeline
Create a LTR_pipeline object with arguments:
- Index reference or path to index on disc
- Weighting model name
- Features list
- Qrels
- LTR model
Call the fit()
method on the created object with the training topics.
Evaluate the results with the Experiment function by using the test topics
pt.LTR_pipeline(index, model, features, qrels, LTR)
More learning to rank examples are provided at: https://colab.research.google.com/drive/1KwHoahx_i0vax9fnCZpLP-JmI9jvSoey
Credits
- Alex Tsolov, University of Glasgow
- Craig Macdonald, University of Glasgow