This repository contains a curated list of awesome references for learning-to-rank.
⛳ Methodology (METH) | 📘 Learning Theory (LT) | 🎯 Optimization (OPT) |
💻 Software (SW) | 🦆 Kaggle Competition (Kaggle) | ⌨️ Empirical Studies (ES) |
📊 Dataset (DATA) | 🥅 Loss Function (Loss) | 🌐 Deep Learning (DL) |
🔄 Ranking Aggregation or Synchronization (RA) |
[DATA][ES] Chapelle, O., & Chang, Y. (2011, January). Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge (pp. 1-24). PMLR.
- keywords: Yahoo dataset, query, document; [chapelle2011yahoo]
- memo: The paper provides overview of Yahoo Learning to Rank Challenge. Winning methods: LambdaMART boosted tree models, LambdaRank neural nets, LogitBoost, ... (most of them are boosting methods)
[METH][Loss][LT] Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. Advances in large margin classifiers, 88(2), 115-132.
- keywords: RankSVM, hinge loss, ordinal regression, pairwise approach, univariate scoring function; [herbrich2000large]
- memo: (i) formulation of RankSVM and ordinal regression with a univariate scoring function.
[METH][OPT] Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of machine learning research, 4(Nov), 933-969.
- keywords: RankBoost, pairwise approach, univariate scoring function; [herbrich2000large]
- memo: (i) formulation of RankBoost under pairwise approach with a univariate scoring function.
[METH][OPT][LT][Loss] Clémençon, S., Lugosi, G., & Vayatis, N. (2008). Ranking and empirical minimization of U-statistics. The Annals of Statistics, 36(2), 844-874.
- keywords: ranking, U-statistics, convex loss, pairwise, bivariate ranking function; [clemenccon2008ranking]
- memo: The paper formulates the ranking problem in a pairwise classification framework, consistency of empirical risk minimizers is established.
[METH][OPT] Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005, August). (Learning to rank using gradient descent)[https://dl.acm.org/doi/pdf/10.1145/1102351.1102363?casa_token=j1eFCUL99SUAAAAA:LslBnjJ_iEXthdsANyIMBOooIlKo5ZO73bMi4YlFXnhoasIohj04IOghY7BRSc4QINzWS4nvrl92wA]. In Proceedings of the 22nd international conference on Machine learning (pp. 89-96).
- keywords: RankNet [burges2005learning]
- memo: RankNet is a pairwise probabilistic model based on the scoring approach: sigmoid function of difference between two scores.
[METH][OPT] Burges, C. J. (2010). (From ranknet to lambdarank to lambdamart: An overview)[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.180.634&rep=rep1&type=pdf]. Learning, 11(23-581), 81.
- keywords: RankNet, LambdaRank, LambdaMART [burges2010ranknet]
[SW][DL][METH] Pasumarthi, R. K., Bruch, S., Wang, X., Li, C., Bendersky, M., Najork, M., ... & Wolf, S. (2019, July). Tf-ranking: Scalable tensorflow library for learning-to-rank. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2970-2978). [ slides + GitHub ]
- keywords: neural learning-to-rank, handcrafted features; [pasumarthi2019tf]
- memo: Neural learning-to-rank software based on TensorFlow is developed, instead of learning with Handcrafted features (dense features), the proposed software is able to directly learn from textual data (sparse features).
[Loss][LT] Kotlowski, W., Dembczynski, K., & Huellermeier, E. (2011, January). Bipartite ranking through minimization of univariate loss. In ICML.
- keywords: pairwise, consistency, scoring approach, bipartite ranking; [kotlowski2011bipartite]
- memo: (i) The paper points out the difference between scoring approaches and ranking approaches in learning-to-rank; (ii) the paper indicates that
log-loss
andexp-loss
are consistent withpairwise 0-1 loss
in ranking, yet the regret are upper bounded by square root (would reduce the convergence rate by half);
[METH][Loss][LT] Narasimhan, H., & Agarwal, S. (2013, January). On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation. In NIPS (pp. 2913-2921).
- keywords: pairwise, scoring approach, bipartite ranking, probability estimation; [narasimhan2013relationship]
- memo: The paper investigates the relation among
binary class probability estimation
,binary classification
, andbipartite ranking
.
[Loss][LT][METH] Ravikumar, P., Tewari, A., & Yang, E. (2011, June). On NDCG consistency of listwise ranking methods. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 618-626). JMLR Workshop and Conference Proceedings.
- keywords: listwise, NDCG, Fisher-consistency; [ravikumar2011ndcg]
- memo: The paper investigates the Fisher-consistency (but not regret consistency) of NDCG.
[Loss][LT][METH] Gao, W., & Zhou, Z. H. (2015, June). On the consistency of AUC pairwise optimization. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
- keywords: pairwise, zero-one loss, Fisher-consistency, consistency, AUC, Bipartite ranking; [gao2015consistency]
- memo: (i) AUC can be formulated by zero-one pairwise ranking problem; (ii) For ranking regret consistency, log loss (yes), exponential loss (yes), hinge loss (no), squared hinge loss (yes). (iii) convergence rate, log loss (square root of 0-1 loss), exponential loss (square root of 0-1 loss), hinge loss (no), squared hinge loss (??).
[JDSH][Loss][LT][METH] Dai, B., Shen, X., Wang, J., & Qu, A. (2019). Scalable collaborative ranking for personalized prediction. Journal of the American Statistical Association, 1-9.
- keywords: pairwise, zero-one loss, Fisher-consistency, consistency, Bipartite ranking; [dai2019scalable]
- memo: (i) The regrets of MAP, DCG, Sum-Loss are all upper bounded by the pairwise 0-1 loss. That is, good performance in pairwise 0-1 loss imply good performance in all mentioned losses; (ii) convergence rate, truncated hinge loss (linear of 0-1 loss).
[RA][METH][LT] Fanuel, M., & Suykens, J. A. (2019). Deformed Laplacians and spectral ranking in directed networks. Applied and Computational Harmonic Analysis, 47(2), 397-422.
- keywords: discrete laplacians, directed graphs, random walks, synchronization; [fanuel2019deformed]
- memo: (i) dilation Laplacians are shown to be useful tools for ranking in directed networks of pairwise comparisons; (ii) An eformation parameter enabling the emphasis of the top-k objects in the ranking.
[Loss][LT] Uematsu, K., & Lee, Y. (2017). On theoretically optimal ranking functions in bipartite ranking. Journal of the American Statistical Association, 112(519), 1311-1322.
- keywords: hinge loss, Bipartite ranking, negative example; [uematsu2017theoretically]
- memo: (i) Hinge loss is Fisher-consistency wrt pairwise zero-one loss in Bipartite ranking (Theorem 5).
[Loss][LT][METH][RA] Waegeman, W., Pahikkala, T., Airola, A., Salakoski, T., Stock, M., & De Baets, B. (2012). A kernel-based framework for learning graded relations from data. IEEE Transactions on Fuzzy Systems, 20(6), 1090-1101.
- keywords: kernel, synchronization, weak stochastic transitivity, linear stochastic transitivity, pairwise approach, bivariate ranking function; [waegeman2012kernel]
- memo: (i) a bivariate kernel presenting a sufficient condition of weak stochastic transitivity, the linear stochastic transitivity (LST; see Definition IV.3), in its associated reproducing kernel Hilbert space (RKHS).
[RA][METH][LT] Shah, N. B., & Wainwright, M. J. (2017). Simple, robust and optimal ranking from pairwise comparisons. Journal of Machine Learning Research, 18(1), 7246-7283.
- keywords: pairwise approach, Borda counting; [shah2017simple]
- memo: (i) Borda counting based method to generate scores to all items based on pairwise comparison. (ii) define and analyze the difficulty of Top-k ranking for the proposed method.
[METH][LT] Chen, Y., Fan, J., Ma, C., & Wang, K. (2019). Spectral method and regularized MLE are both optimal for top-K ranking. Annals of statistics, 47(4), 2204.
- keywords: top-K ranking; pairwise comparisons; spectral method; regularized MLE; entrywise perturbation; leave-one-out analysis; reversible Markov chains; [chen2019spectral]
- memo: scoring approach to ranking (model-based on (1.1)); minimax rate for ranking; gap assumption between top-k and top-(k+1).
[LT][Loss] Xia, F., Liu, T. Y., Wang, J., Zhang, W., & Li, H. (2008, July). Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning (pp. 1192-1199).
- keywords: ranking loss; listwise; consistency; order sentitve; [xia2008listwise]
- memo: Ranking losses for listwise learning-to-rank, and their theoretical consistency.