lintool / robust04-analysis

Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Meta-Analysis of Robust04 Papers

This repro contains raw data from a meta-analysis of papers that used the test collection from the TREC 2004 Robust Track (Robust04), as described in:

  • Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), July 2018, Paris, France.

Methodology

We exhaustively examined every publication from 2005 to 2018 in the following venues to identify those that reported results on Robust04: SIGIR, CIKM, WWW, ICTIR, ECIR, KDD, WSDM, TOIS, IRJ, IPM, and JASIST. This was supplemented by Google Scholar searches to identify a few additional papers not in the venues indicated above. Our meta-analysis was conducted in January 2019, but after the paper acceptance we included a few more papers. A number of exclusion criteria were applied, best characterized as discarding corner cases: for example, papers that only used a subset of the topics or papers that had metrics plotted in a graph. In total, we examined 130 papers; of these, 109 papers contained extractable average precision values that formed the basis of the results reported below. Note that some papers did not report AP, and thus were excluded from consideration.

For each of the 109 papers, we extracted the highest average precision score achieved on Robust04 by the authors' proposed methods, regardless of experimental condition (ignoring oracle conditions and other unrealistic setups). We further categorized the papers into neural (18) and non-neural (91) approaches. Methods that used word embeddings but not neural networks directly in ranking were considered "neural" in our classification. From each paper we also extracted the authors' baseline: in most cases, these were explicitly defined; if multiple were presented, we selected the best. If the paper did not explicitly mention a baseline, we selected the best comparison condition using a method not by the authors (or based on previous work).

Overview

Results Overview

A visualization of our meta-analysis is presented above. For each paper, we show the baseline and the best result as an empty circle and a filled circle (respectively), connected by a line. All papers are grouped by their publication year. Neural approaches are shown in blue, and non-neural approaches in red. We also show two regression trendlines, for non-neural (red) as well as neural approaches (blue). A number of reference conditions are plotted as horizontal lines: the best submitted run at the TREC 2004 Robust Track (TREC best) at 0.333 AP is shown as a solid black line, and the median TREC run under the "title" condition at 0.258 AP is shown as a dotted black line (TREC median). Finally, we show the effectiveness of an untuned RM3 run (i.e., default parameters) from the Anserini system.

Results

The follow results table is generated via the Python script json_to_md.py, which summarizes the raw results in robust04_papers.json. The column "standard" indicates if the paper used a "standard" configuration of Robust04; a non-standard configuration might be an evaluation that uses only a subset of the topics. For the "non-standard" papers we did not extract effectiveness metrics, and in some of the "standard" papers effectiveness metrics were not easily extractable (e.g., they are presented in a graph). In both cases AP values in the table were left blank.

paper standard? neural? baseline AP best AP
Diaz (CIKM 2005) yes no 0.2626 0.2625
Liu et al. (CIKM 2005) yes no 0.327 0.3574
He and Ounis (SIGIR 2005) yes no 0.2858 0.2724
Diaz and Metzler (SIGIR 2006) yes no 0.3214 0.353
Fang and Zhai (SIGIR 2006) yes no 0.248 0.302
Amati (ECIR 2006) yes no 0.2984 0.3052
Diaz (IRJ 2007) yes no 0.2961 0.3068
He and Ounis (TOIS 2007) yes no 0.2777 0.2907
Zhang et al. (CIKM 2007) yes no 0.333 0.364
Troy and Zhang (SIGIR 2007) yes no 0.2327 0.2628
Vechtomova and Karamuftuoglu (IPM 2008) yes no 0.2799 0.2753
Blanco and Barreiro (ECIR 2008) yes no 0.2844 0.2967
Bendersky and Croft (SIGIR 2008) yes no 0.2569 0.262
Meij et al. (SIGIR 2008) yes no 0.243 0.2689
Zighelnic and Kurland (SIGIR 2008) yes no 0.299 0.293
Xu et al. (SIGIR 2008) yes no 0.2294 0.2335
Wu et al. (IPM 2009) yes no 0.3465 0.3522
Clinchant and Gaussier (ICTIR 2009) yes no 0.269 0.274
Soskin et al. (ICTIR 2009) yes no 0.307 0.288
Collins-Thompson (ICTIR 2009) yes no
He and Ounis (CIKM 2009) no no
Zhu et al. (ECIR 2009) yes no
Lease et al. (ECIR 2009) yes no 0.2591 0.2733
Collins-Thompson (CIKM 2009) yes no 0.2441 0.2451
Xu et al. (SIGIR 2009) yes no 0.2823 0.3002
Zhu et al. (SIGIR 2009) yes no 0.223 0.226
Wang and Zhu (SIGIR 2009) yes no 0.231 0.249
Lease (SIGIR 2009) yes no 0.3052 0.3182
Kalmanovich and Kurland (SIGIR 2009) yes no 0.304 0.307
Cormack et al. (SIGIR 2009) yes no 0.3652 0.3686
Bendersky et al. (SIGIR 2009) yes no 0.2584 0.2516
Zheng and Fang (ECIR 2010) yes no 0.2421 0.2578
Bendersky et al. (WSDM 2010) yes no 0.2661 0.2721
Piwowarski and Frommholz (CIKM 2010) yes no 0.242 0.228
Lang et al. (CIKM 2010) yes no 0.3057 0.3313
Xue et al. (CIKM 2010) yes no 0.2683 0.2737
Dillon and Collins-Thompson (CIKM 2010) yes no 0.2248 0.2256
Clinchant and Gaussier (SIGIR 2010) yes no 0.29 0.303
Dang et al. (SIGIR 2010) yes no 0.2538 0.2584
Park and Croft (SIGIR 2010) yes no 0.2517 0.2621
Clinchant and Gaussier (IRJ 2011) yes no 0.277 0.285
Clinchant and Gaussier (ICTIR 2011) yes no 0.294 0.301
Hui et al. (ICTIR 2011) no no
Lv and Zhai (CIKM 2011) yes no 0.2544 0.2553
Park et al. (CIKM 2011) yes no 0.2477 0.2786
Karimzadehgan and Zhai (CIKM 2011) yes no
Hui et al. (CIKM 2011) yes no 0.257 0.2304
Kotov and Zhai (CIKM 2011) yes no
Lv and Zhai (CIKM 2011) yes no 0.2543 0.2571
Lv and Zhai (SIGIR 2011) yes no 0.2543 0.2553
Xue and Croft (SIGIR 2011) yes no 0.2683 0.2755
Blanco and Lioma (IRJ 2012) yes no 0.2243 0.2329
Lv and Zhai (ECIR 2012) yes no 0.254 0.2543
Bendersky et al. (WSDM 2012) yes no 0.2956 0.3068
Lv and Zhai (CIKM 2012) yes no 0.2521 0.253
Xue and Croft (SIGIR 2012) yes no 0.2683 0.28
Bendersky and Croft (SIGIR 2012) yes no 0.2741 0.2779
Huang et al. (IPM 2013) yes no 0.2857 0.2916
Na (IPM 2013) yes no 0.2987 0.3225
Xue and Croft (TOIS 2013) yes no 0.2857 0.2788
Clinchant and Gaussier (ICTIR 2013) yes no 0.287 0.3
Wu and Fang (ICTIR 2013) no no
Clinchant and Perronnin (ICTIR 2013) no no
Rousseau and Vazirgiannis (CIKM 2013) yes no 0.2547 0.2403
Raiber and Kurland (SIGIR 2013) yes no
Maxwell and Croft (SIGIR 2013) yes no 0.2657 0.2732
Deveaud et al. (SIGIR 2013) yes no
Symonds et al. (JASIST 2014) yes no 0.2707 0.2869
Brosseau-Villeneuve et al. (IRJ 2014) yes no 0.2922 0.3136
Kocabas et al. (IRJ 2014) yes no 0.2718 0.2764
Zhao et al. (TOIS 2014) yes no 0.2562 0.2666
Huston and Croft (CIKM 2014) no no
Ye and Huang (SIGIR 2014) yes no 0.299 0.3082
Dalton et al. (SIGIR 2014) yes no 0.2938 0.3277
Lv and Zhai (IRJ 2015) yes no 0.2788 0.2797
Na (TOIS 2015) yes no 0.2813 0.2927
Costa et al. (TOIS 2015) no no
Raiber et al. (ICTIR 2015) yes no 0.235 0.237
Diaz (ICTIR 2015) yes no 0.2892 0.2767
Lv (CIKM 2015) yes no 0.2352 0.2422
Zheng and Callan (SIGIR 2015) yes no 0.2749 0.2851
Ye and Huang (JASIST 2016) yes no 0.299 0.3086
Miao et al. (TOIS 2016) yes no 0.2894 0.3035
Diaz (ECIR 2016) yes no 0.2726 0.2736
Balaneshinkordan and Kotov (ECIR 2016) yes no 0.2426 0.2503
Cummins (WWW 2016) yes no 0.296 0.314
Raviv et al. (SIGIR 2016) yes no 0.284 0.305
Montazeralghaem et al. (SIGIR 2016) yes no 0.2829 0.2979
Ai et al. (SIGIR 2016) yes yes 0.259 0.267
Lu et al. (ICTIR 2016) yes no 0.264 0.27
Zamani and Croft (ICTIR 2016) yes yes 0.219 0.2364
Ai et al. (ICTIR 2016) yes yes
Zamani and Croft (ICTIR 2016) yes yes 0.2677 0.275
Balaneshin-kordan and Kotov (ICTIR 2016) no no
Guo et al. (CIKM 2016) yes yes 0.255 0.279
Balaneshin-kordan and Kotov (CIKM 2016) no no
Guo et al. (CIKM 2016) yes yes 0.295 0.274
Dehghani et al. (CIKM 2016) yes no 0.2961 0.2945
Anava et al. (CIKM 2016) yes no 0.301 0.303
Levi et al. (CIKM 2016) yes no
Zamani et al. (CIKM 2016) yes no 0.282 0.2899
Kuzi et al. (CIKM 2016) yes yes 0.282 0.291
Ariannezhad et al. (ECIR 2017) yes no 0.2822 0.2926
Yang and Fang (ICTIR 2017) no no
Ensan and Bagheri (WSDM 2017) yes no 0.3278 0.3382
Dehghani et al. (SIGIR 2017) yes yes 0.2503 0.2837
Zamani and Croft (SIGIR 2017) yes yes 0.2593 0.2761
Sherman and Efron (SIGIR 2017) yes no 0.2639 0.2674
Montazeralghaem et al. (SIGIR 2017) yes no 0.2829 0.295
Ariannezhad et al. (SIGIR 2017) yes no 0.254 0.255
Cummins (ICTIR 2017) yes no 0.305 0.3
Raiber and Kurland (ICTIR 2017) yes no 0.288 0.289
Na and Kim (IPM 2018) yes no 0.3015 0.3122
Hubert et al. (IPM 2018) no no
Van Gysel et al. (TOIS 2018) no no
Ai et al. (ECIR 2018) yes yes 0.21 0.256
Zhang et al. (ECIR 2018) yes yes 0.254 0.256
Zou et al. (ICTIR 2018) yes no 0.244 0.257
Zamani and Croft (ICTIR 2018) yes yes 0.2499 0.2831
Li and Jia (CIKM 2018) yes yes 0.248 0.296
Bagheri et al. (CIKM 2018) yes yes
Zamani et al. (CIKM 2018) yes yes 0.2865 0.2971
Roy et al. (CIKM 2018) yes yes 0.252 0.2486
Jian et al. (SIGIR 2018) yes no 0.2642 0.2637
Montazeralghaem et al. (SIGIR 2018) yes no 0.2919 0.2986
Li et al. (EMNLP 2018) yes yes 0.2966 0.2904
McDonald et al. (EMNLP 2018) yes yes 0.258 0.272
Dehghani et al. (ICLR 2018) yes yes 0.2702 0.3124
Yang et al. (arXiv 2019) yes yes 0.3033 0.3278
MacAvaney et al. (arXiv 2019) yes yes

About

Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)


Languages

Language:Python 100.0%