From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking

This repository contains the source code and datasets for the CIKM 2022 paper From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking by Zhu et al.

Abstract

Contextual information in search sessions is important for capturing users' search intents. Various approaches have been proposed to model user behavior sequences to improve document ranking in a session. Typically, training samples of (search context, document) pairs are sampled randomly in each training epoch. In reality, the difficulty to understand user's search intent and to judge document's relevance varies greatly from one search context to another. Mixing up training samples of different difficulties may confuse the model's optimization process. In this work, we propose a curriculum learning framework for context-aware document ranking, in which the ranking model learns matching signals between the search context and the candidate document in an easy-to-hard manner. In so doing, we aim to guide the model gradually toward a global optimum. To leverage both positive and negative examples, two curricula are designed. Experiments on two real query log datasets show that our proposed framework can improve the performance of several existing methods significantly, demonstrating the effectiveness of curriculum learning for context-aware document ranking.

Authors: Yutao Zhu, Jian-Yun Nie, Yixuan Su, Haonan Chen, Xinyu Zhang, and Zhicheng Dou

Requirements

Python 3.8.5
Pytorch 1.8.1 (with GPU support)
Transformers 4.5.1
pytrec-eval 0.5

Usage

Obtain the data (some data samples are provided in the data directory)
- For AOL dataset, please contact the author of CARS
- For Tiangong dataset, you can download it from the link
- Decompress all data to the "data" directory
Prepare the pretrained BERT model
- BertModel
- BertChinese
- Save these models to the "pretrained_model" directory
Prepare the pretrained COCA model
- Download the contrastive pretrained model from the link
- Save the checkpoint to the "pretrained_model" directory
Train the model (on AOL)

python3 runModelCL.py --task aol --is_training --bert_model_path ./pretrained_model/BERT/ --pretrain_model_path ./pretrained_model/coca.aol

Test the model (on AOL)

python3 runModelCL.py --task aol --bert_model_path ./pretrained_model/BERT/ --pretrain_model_path ./pretrained_model/coca.aol

Test from our trained models
- We provide the checkpoint of our trained models on both AOL and Tiangong-ST datasets for test

Citations

If you use the code and datasets, please cite the following paper:

@inproceedings{ZhuNSCZD22,
  author    = {Yutao Zhu and
               Jian{-}Yun Nie and
               Yixuan Su and
               Haonan Chen and
               Xinyu Zhang and
               Zhicheng Dou},
  editor    = {Mohammad Al Hasan and
               Li Xiong},
  title     = {From Easy to Hard: {A} Dual Curriculum Learning Framework for Context-Aware
               Document Ranking},
  booktitle = {Proceedings of the 31st {ACM} International Conference on Information
               {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022},
  pages     = {2784--2794},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3511808.3557328},
  doi       = {10.1145/3511808.3557328}
}

DaoD / DCL

From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking

Abstract

Requirements

Usage

Citations

About

Languages