osirrc / jig

Jig for the Open-Source IR Replicability Challenge (OSIRRC)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal for training hooks

lintool opened this issue · comments

As a recap, here's where we're at: we have the init, index, and search hooks fleshed out reasonably well, with the following "canonical" execution sequence:

[init] -> [index: save-to-snapshot] -> [load-from-snapshot: search]

A proposal to add in a training hook is as follows:

[init] -> [index: save-to-snapshot1] -> [load-from-snapshot1: train: save-to-snapshot2]
       -> [load-from-snapshot2: test]

The train hook would get the topics and qrels as input. As part of the contract, the jig would manage the snapshotting, so the from the perspective of the container, it would be as if, init, index, train, test ran in an uninterrupted sequence.

The snapshotting allows the jig to efficiently retrain different models (if the image supports it), and to test on different held-out test sets.

Also, we would propose a cross-validation hook, e.g.,

[init] -> [index: save-to-snapshot] -> [load-from-snapshot: xvalidate]

The input to the cross-validation hook would contain the topic, qrels, and folds.

Thoughts?

Thanks for the recap!
For supervised models or other approaches which implement a best model selection (such as nvsm) the train hook should receive an indication of which topics to use in the training and validation steps and the qrels too. In nvsm we handle that via a list of topic ids parsed from an external file, our hook therefore receives a path to the training and validation ids lists and the path of the qrels file (to perform best model selection according to the validation subset).

Cross validation imho is a good option to add (with the same considerations I made above) for models which do not require a long time to train.

@albpurpura

the train hook should receive an indication of which topics to use in the training and validation steps and the qrels too.

Yes, that's exactly the plan.

In my mind, there are two ways a team can implement training and/or cross-validation in their image:

  1. actually do the training.
  2. "fake it".

By (2), I mean, an image checks the input of the training/validation topics, and selects an appropriate pre-trained model to use. This is doable because I assume we'll have some sort of standard fold setting. The hook can just return an error if it is given an "unknown" split.

Obviously (1) should be preferable, but I think (2) would be acceptable also - i.e., better than nothing.

Reactions?

@lintool I agree with you on the implementation of the "fake" training. One could load a pre-trained model that we'd share and if that is provided then no actual training is performed, otherwise, we train a new model.

You also wrote:

This is doable because I assume we'll have some sort of standard fold setting.

Do you mean that we could provide a standard split of the topics for training and validation (and consequently test) for all systems? That is a good idea which would make the performance of different models comparable.

Loading pre-trained models for cross validation could also be done but only if the folds are provided in advance and of course that would be the only supported way to xvalidate a system with pre-trained models. We should also consider, before deciding to do this, that more pre-trained models will take up lots of space.

For example, for Robust04 I would suggest the two-fold and five-fold settings here:
https://github.com/castorini/anserini/blob/master/docs/experiments-forum2018.md

This would make our results comparable to previous work.

If the image gets a fold configuration it doesn't recognize (and doesn't have a trained model for)... it can just throw an error.