ruanqin0706 / UserRecSimulation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unveiling the Relationship between News Recommendation Algorithms and Media Bias: A Simulation-based Analysis of the Evolution of Bias Prevalence

Overview:

This repository contains the source code and data for the paper titled "Unveiling the Relationship between News Recommendation Algorithms and Media Bias: A Simulation-based Analysis of the Evolution of Bias Prevalence". This research aims to investigate the relationship between news recommendation algorithms and media bias. We designed a news recommendation simulation framework to evaluate the impact of media bias on different recommendation algorithms under different user choice strategies. The project comprises four folders: Articles, Users, Recommenders, and Observers.

Articles

The news dataset is based on the SemEval-2019 Task 4 Hyperpartisan Dataset. The datasets used in the experiments are located in the articles/processed_data directory.

Users

The code for generating synthetic users is included in the users directory. The user groups tested in the experiments are in the users/synthetic_user_groups directory.

Recommenders

The NAML, NPA, and NRMS algorithms use the official implementation of Microsoft Recommenders. The FIM and PLM-empowered algorithms were re-implemented using the PyTorch framework.

Observers

This section includes simulation scripts, analyzers, and results. The simulation folder implements the feedback loop interaction between users, news recommendation algorithms, and candidate news sets. The user choice strategies reported in the paper are implemented in the observer/simulation/simulate_feedback.py file. The scripts folder connects the observation process, while the analyzer folder analyzes the results of the evolution of bias prevalence in users' browsing histories. The analyzed results are recorded in the results folder.

Media Bias Aware Dataset

Generating the Media Bias Aware News Recommendation Dataset

We utilized the Hyperpartisan News Detection Dataset, released with the SemEval-2019 Task 4 Hyperpartisan detection task, due to its extensive bias labels. To ensure accurate bias labels, we used the Overlap-checking (1:1) model, retaining only articles where the distant supervision bias labels matched the model's predictions. This validation process resulted in 409,757 articles.

These articles span from 1960 to 2018, with a sparse distribution in earlier years. We focused on articles from May 1, 2017, to December 31, 2017, resulting in a subset of 72,940 news articles, ensuring a consistent daily news flow. We processed this subset by removing HTML tags and special characters and generating news summaries using PEGASUS.

We used Latent Dirichlet Allocation (LDA) to categorize the articles into 20 news themes, based on perplexity scores. This dataset was then fed into the simulation framework, with a cut-off date of June 24, 2017.

  • News Recommendation Dataset: Includes user-item interaction records from May 1 to June 24, providing users' reading histories and interacted news articles.

    • Training Split: Data from May 1 to June 17, used to train news recommendation algorithms.
    • Evaluation Split: Data from June 17 to June 24, used to evaluate the trained recommendation algorithms.
  • Candidate News Dataset: News articles published from June 25 to December 31, presented to users during simulations.

Citation

If you use this code or the techniques presented in our research, please cite our paper as follows:

@inproceedings{ruan2023unveiling,
  title={Unveiling the Relationship Between News Recommendation Algorithms and Media Bias: A Simulation-Based Analysis of the Evolution of Bias Prevalence},
  author={Ruan, Qin and Mac Namee, Brian and Dong, Ruihai},
  booktitle={International Conference on Innovative Techniques and Applications of Artificial Intelligence},
  pages={210--215},
  year={2023},
  organization={Springer}
}

If you use the Media Bias Aware Dataset, please cite it as follows:

@dataset{ruan_2024_11168981,
  author       = {Ruan, Qin and
                  Mac Namee, Brian and
                  Dong, Ruihai},
  title        = {Media Bias Aware Simulation Dataset},
  month        = may,
  year         = 2024,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.11168981},
  url          = {https://doi.org/10.5281/zenodo.11168981}
}

Contact

For any queries regarding the code or research, please contact:

About


Languages

Language:Jupyter Notebook 87.5%Language:Python 10.1%Language:Shell 2.4%