julmaxi/summary_lq_analysis

Code for the EACL 2021 paper How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation

Most experiments can be reproduced by running the accompanying jupyter notebook. For significance analysis run scrips/r/analyse-ordinal.r anonymized_judgements/<data_file> crossed

Power analysis is not included in the steps, as it is computationally expensive. To reproduce one step, run

python -m summaryanalysis.design_power -b <batch count> -d <docs per batch> -a <annotators per doc> <model_file> out.csv

About

Languages

Language:Jupyter Notebook 80.5%Language:Python 15.4%Language:R 4.2%