WinDB3ll / UHGEval

Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

Home Page:https://arxiv.org/abs/2311.15296

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

English | 中文简体

License: Apache arXiv Paper

🍄 UHGEval

Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

  • Safety: Ensuring the security of experimental data is of utmost importance.
  • Flexibility: Easily expandable, with all modules replaceable.

Quick Start

Get started quickly with a 20-line demo program.

  • UHGEval requires Python>=3.10.0
  • pip install -r requirements.txt
  • Take uhgeval/configs/example_config.py as an example, create uhgeval/configs/real_config.py to configure the OpenAI GPT section.
  • Run demo.py

Advanced Usage

Utilize run_uhgeval.py or run_uhgeval_future.py for a comprehensive understanding of this project. The former is currently a provisional piece of code slated for removal in the future; whereas the latter is command-line executable code intended for future use.

Results for Experiment-20231117

The original experimental results are in ./archived_experiments/20231117.

Contributions

Although we have conducted thorough automatic annotation and manual verification, there may still be errors or imperfections in our XinhuaHallucinations dataset with over 5000 data points. We encourage you to raise issues or submit pull requests to assist us in improving the consistency of the dataset. You may also receive corresponding recognition and rewards for your contributions.

TODOs

Click me to show all TODOs
  • llm, metric: enable loading from HuggingFace
  • config: utilize conifg to realize convenient experiment
  • TruthfulQA: add new dataset and corresponding evaluators
  • another repo: creation pipeline of dataset
  • contribution: OpenCompass

CITATION

@article{UHGEval,
    title={UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation},
    author={Xun Liang and Shichao Song and Simin Niu and Zhiyu Li and Feiyu Xiong and Bo Tang and Zhaohui Wy and Dawei He and Peng Cheng and Zhonghao Wang and Haiying Deng},
    journal={arXiv preprint arXiv:2311.15296},
    year={2023},
}

About

Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

https://arxiv.org/abs/2311.15296

License:Apache License 2.0


Languages

Language:Python 100.0%