News Recommendation with
Category Description by a Large Language Model

Anonymous Author(s)

This repository is the official implementation for the paper: News Recommendation with Category Description by a Large Language Model. (to appear)

Overview

In this study, we proposed a novel approach that utilizes Large Language Models (LLMs) to automatically generate descriptive texts for news categories, which are then applied to enhance news recommendation performance. Comprehensive experiments demonstrate that our proposed method achieves a 5.8% improvement in performance compared to baselines.

The paper is will be published soon. We will update the README with more details and a link to the paper once it becomes available.

Directories

$ tree -L 2
.
├── LICENSE
├── README.md
├── dataset/ # MIND dataset and its download script
│   ├── download_mind.py
│   ├── generated/
│   └── mind/
├── pyproject.toml
├── requirements-dev.lock
├── requirements.lock
├── scripts/ # Script (Bash) for the experiment
│   ├── train_naml.sh
│   ├── train_npa.sh
│   └── train_nrms.sh
├── src/
│   ├── config/ # Configuration
│   ├── const/
│   ├── evaluation/ # Evaluation Metrics: nDCG, AUC, MRR
│   │   ├── RecEvaluator.py
│   ├── experiment/ # Experiment Command
│   │   ├── generation/
│   │   └── train.py
│   ├── mind/ # Loading the dataset
│   │   ├── CategoryAugmentedMINDDataset.py
│   │   ├── MINDDataset.py
│   │   └── dataframe.py
│   ├── recommendation/ # Recommendation Models by PyTorch & Transformers
│   │   ├── __init__.py
│   │   ├── common_layers/
│   │   ├── naml/
│   │   ├── npa/
│   │   └── nrms/
│   └── utils/
└── test/ # Unit test
    ├── evaluation
    ├── mind
    └── recommendation

Preparation

Requirements

Rye

It also works with Python v3.11.3 + pip.

Setup

At first, you can install dependencies by running:

$ rye sync

Next, please set PYTHONPATH to environment variable:

$ export PYTHONPATH=$(pwd)/src:$(pwd)

Download MIND dataset

We use MIND (Microsoft News Dataset) dataset for training and validating the news recommendation model. You can download them by executing dataset/download_mind.py.

$ rye run python ./dataset/download_mind.py

By executing dataset/download_mind.py, the MIND dataset will be downloaded from an external site and then extracted.

If you successfully executed, dataset folder will be structured as follows:

./dataset/
├── download_mind.py
└── mind
    ├── large
    │   ├── test
    │   ├── train
    │   └── val
    ├── small
    │   ├── train
    │   └── val
    └── zip
        ├── MINDlarge_dev.zip
        ├── MINDlarge_test.zip
        ├── MINDlarge_train.zip
        ├── MINDsmall_dev.zip
        └── MINDsmall_train.zip

Generate Category Description by GPT-4

In this step, you will need an OpenAI API_KEY. Please follow this document to obtain an API_KEY.

If you are unable to issue an API_KEY, the category descriptions generated in this step are already provided in the repository (category_description_gpt4.json), so please use this.

At first, please set OpenAI API_KEY to environment variable:

$ export OPENAI_API_KEY={YOUR_OPENAI_API_KEY}

Then, generate category description by running:

$ rye run python ./src/experiment/generation/gpt_based_text_generation.py

After executing this code, you can confirm that the file category_description_gpt4.json has been generated under the dataset/generated directory.

Experiments

Train & Evaluate Models

By executing train.py, you can train and evaluate the news recommendation model.

To train and evaluate all models(NAML, NRMS, NPA + BERT, DistilBERT) and methods(title only, template-based, generated-description), please execute following commands:

$ rye run python src/experiment/train.py -m pretrained="distilbert-base-uncased","bert-base-uncased" gradient_accumulation_steps=16 batch_size=8 augmentation_method=GPT4,TEMPLATE_BASED,NONE news_recommendation_model=NAML,NRMS,NPA max_len=64

If you want to try a specific model or method individually, you can specify it using arguments when running train.py.

recommendation_model(NAML,NRMS,NPA): Specifies the recommendation model.
pretrained("distilbert-base-uncased","bert-base-uncased"): Specifies the pre-trained model.
augmentation_method(GPT4,TEMPLATE_BASED,NONE): Specifies the augmentation method for the input text.

For example, if you want to try generated-description with NPA + DistilBERT, please run the following command:

$ rye run python src/experiment/train.py -m pretrained="distilbert-base-uncased" gradient_accumulation_steps=16 batch_size=8 augmentation_method=GPT4 news_recommendation_model=NPA max_len=64

Evaluation Result

Rec Model	PLM	Method	AUC	MRR	nDCG@5	nDCG@10
NAML	DistilBERT	title only	0.675	0.292	0.317	0.384
		title + template-based	0.690	0.295	0.327	0.393
		title + generate-description (ours)	0.713	0.326	0.363	0.425
	BERT	title only	0.700	0.318	0.350	0.414
		title + template-based	0.696	0.308	0.340	0.405
		title + generate-description (ours)	0.707	0.322	0.357	0.420
NRMS	DistilBERT	title only	0.674	0.297	0.322	0.387
		title + template-based	0.675	0.311	0.341	0.400
		title + generate-description (ours)	0.707	0.324	0.359	0.422
	BERT	title only	0.689	0.306	0.336	0.400
		title + template-based	0.667	0.301	0.329	0.389
		title + generate-description (ours)	0.706	0.320	0.355	0.418
NPA	DistilBERT	title only	0.700	0.311	0.344	0.408
		title + template-based	0.698	0.309	0.342	0.407
		title + generate-description (ours)	0.707	0.319	0.354	0.417
	BERT	title only	0.689	0.301	0.332	0.398
		title + template-based	0.694	0.314	0.345	0.410
		title + generate-description (ours)	0.710	0.324	0.360	0.422

Citation

TBD

yamanalab / gpt-augmented-news-recommendation