Evaluation results, metrics and scripts
huangyangyi opened this issue · comments
Hi everyone,
We are sorry to tell you that we have found some problems with our evaluation protocols that are not well aligned with the baseline methods reported in our arXiv paper. We will update the results (both images and the correct metrics) of ELICIT and all other baselines, together with the evaluation scripts to the repo once we are ready (in weeks).
We apologize for any inconvenience this may cause and appreciate your understanding
We have uploaded our results and all results of baselines to Google Drive, and updated our arXiv paper and videos.