ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Public repository associated with ILDAE: Instance-Level Difficulty Analysis of Evaluation Data, ACL 2022.

Abstract

Knowledge of questions' difficulty level helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions.

Can we extract such benefits of instance difficulty in Natural Language Processing?

To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of $23$ datasets and demonstrate its five novel applications:

conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time,
improving quality of existing evaluation datasets by repairing erroneous and trivial instances,
selecting the best model based on application requirements,
analyzing dataset characteristics for guiding future data creation,
estimating Out-of-Domain performance reliably.

Comprehensive experiments for these applications result in several interesting findings, such as evaluation using just 5% instances (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2% higher Kendall correlation with Out-of-Domain performance. We release our computed difficulty scores https://github.com/nrjvarshney/ILDAE to encourage research in important yet understudied areas such as efficient evaluations and difficulty analysis.

Difficulty Scores

Will soon be added

nrjvarshney / ILDAE

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Abstract

Difficulty Scores

About