princeton-nlp / ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The human evaluation results

HuuuNan opened this issue · comments

commented

Hi there,
The work is great and thank you for your code sharing.
Would you share the human evaluation results in the Sec.6 and the script for "evaluate the accuracy of your automatic metrics by treating the human annotations as gold labels."?
Thanks a lot.

Hi,

Thanks for your interest in our work! We added the detailed human eval results in human_eval folder.

commented

@gaotianyu1350 Thanks for your reply and update. Could you please also share the script for "evaluate the accuracy of your automatic metrics by treating the human annotations as gold labels", i.e. calculate the citation recall, citation precision, insufficient citations and irrelevant citations metrics? That would be helpful for understanding your work correctly.

@howard-yen can you help sharing the script? Thanks!

Hi @HuuuNan, I updated the human_eval dir with the script, thanks for your interest in our project!