KID-22 / Source-Bias

Code for "Neural Retrievers are Biased Towards LLM-Generated Content"

Home Page:https://arxiv.org/abs/2310.20501

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Source Bias

The official repository of the KDD 2024 paper "Neural Retrievers are Biased Towards LLM-Generated Content". [arXiv]

🌟 New Release!🌟 Check out our latest project, "Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration" at GitHub. This extensive benchmark includes 16 datasets, over ten popular retrieval models, and easy-to-use evaluation tools. Please dive into our repository for more details!

Citation

If you find our code or work useful for your research, please cite our work.

@article{dai2024neural,
  title={Neural Retrievers are Biased Towards LLM-Generated Content},
  author={Dai, Sunhao and Zhou, Yuqi and Pang, Liang and Liu, Weihao and Hu, Xiaolin and Liu, Yong and Zhang, Xiao and Wang, Gang and Xu, Jun},
  journal={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  year={2024}
}

Quick Start

  • For details of datasets, please check file datasets/README.md

  • For details of evaluating codes, please check the code in the folder evaluate/

  • For details of dataloader code, please check the file beir/datasets/data_loader.py

File Structure

.
β”œβ”€β”€ beir  # * evaluating codes from beir
β”‚   β”œβ”€β”€ datasets # * codes for datalaoder
β”‚   β”œβ”€β”€ reranking # * codes for reranking model
β”‚   └── retrieval # * codes for lexical and dense retrieval model 
β”œβ”€β”€ datasets
β”‚   β”œβ”€β”€ 0.2 # * corpus generted by LLM with temperature 0.2
β”‚   β”œβ”€β”€ 1.0 # * corpus generted by LLM with temperature 1.0
β”‚   └── qrels # * relevance for queries
└── evaluate  # * codes for evaluating different retrieval model

Quick Start Example with Contriever

# test on human corpus
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target human --candidate_lm human

# test on llama-2-7b-chat corpus
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target llama-2-7b-chat --candidate_lm llama-2-7b-chat

# test metric targeting on human-written on mix-corpora
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target human --candidate_lm human llama-2-7b-chat

# test metric targeting on LLM-generated on mix-corpora
python evaluate/evaluate_contriever.py --test_dataset scifact \
    --target llama-2-7b-chat --candidate_lm human llama-2-7b-chat

Dependencies

The Cocktail benchmark is built based on BEIR and Sentence Transformers.

This repository has the following dependency requirements.

python==3.10.13
pandas==2.1.4
scikit-learn==1.3.2
evaluate==0.4.1
sentence-transformers==2.2.2
spacy==3.7.2
tiktoken==0.5.2
pytrec-eval==0.5

The required packages can be installed via pip install -r requirements.txt.

About

Code for "Neural Retrievers are Biased Towards LLM-Generated Content"

https://arxiv.org/abs/2310.20501


Languages

Language:Python 100.0%