eecrazy/sensational_headline

Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning

This is the PyTorch implementation of the paper:

Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning. Peng Xu, Chien-Sheng Wu, Andrea Madotto, Pascale Fung EMNLP 2019 [PDF]

This code has been written using python3 and PyTorch >= 0.4.0 and its built on top of https://github.com/atulkum/pointer_summarizer. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:

@inproceedings{xu2019clickbait,
  title={Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning},
  author={Xu, Peng and Wu, Chien-Sheng and Madotto, Andrea and Fung, Pascale},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
  pages={3056--3066},
  year={2019}
}

Abstract

Sensational headlines are headlines that capture people's attention and generate reader interest. Conventional abstractive headline generation methods, unlike human writers, do not optimize for maximal reader attention. In this paper, we propose a model that generates sensational headlines without labeled data. We first train a sensationalism scorer by classifying online headlines with many comments ("clickbait") against a baseline of headlines generated from a summarization model. The score from the sensationalism scorer is used as the reward for a reinforcement learner. However, maximizing the noisy sensationalism reward will generate unnatural phrases instead of sensational headlines. To effectively leverage this noisy reward, we propose a novel loss function, Auto-tuned Reinforcement Learning (ARL), to dynamically balance reinforcement learning (RL) with maximum likelihood estimation (MLE). Human evaluation shows that 60.8% of samples generated by our model are sensational, which is significantly better than the Pointer-Gen baseline and other RL models.

Auto-tuned Reinforcement Learning:

The loss function of Auto-tuned Reinforcement Learning is a weighted sum of RL and MLE, where the weight is decided by the sensationalism scorer or any other reward functions.

Sensationalization Strategies

Our model is able to generate sensational headlines using diverse sensationalization strategies. These strategies include, but are not limited to, creating a curiosity gap, asking questions, highlighting numbers, being emotional and emphasizing the user.

Dependency

Check the packages needed or simply run the command

❱❱❱ pip install -r requirements.txt

Resources needed

You can also download the well trained model and unzip to the project home directory

To train and run your model, you need datasets and unzip to the project home directory

We also use the pretrained Chinese embedding in from this website or you can directly download from here

Experiment

Quick Result

To skip training, please check

Pointer-Gen: save/PointerAttn/Pointer_Gen/test_prediction

Pointer-Gen+RL-ROUGE: save/Rl/Pointer_Gen_RL_ROUGE/test_prediction

Pointer-Gen+RL-SEN: save/Rl/Pointer_Gen_RL_SEN/test_prediction

Pointer-Gen+ARL-SEN: save/Rl/Pointer_Gen_ARL_SEN/test_prediction

Training

Pointer-Gen+RL-SEN

❱❱❱ python sensation_generation.py -path save/PointerAttn/Pointer_Gen/ -sensation_scorer_path save/sensation/512_0.9579935073852539/ -thd 0.1 -use_rl True -use_s_score 0 -ml_wt 0.5

Pointer-Gen+ARL-SEN

❱❱❱ python sensation_generation.py -path save/PointerAttn/Pointer_Gen/ -sensation_scorer_path save/sensation/512_0.9579935073852539/ -thd 0.1 -use_rl True -use_s_score 1

Generation

Pointer-Gen

❱❱❱ python sensation_save.py -path save/PointerAttn/Pointer_Gen/ -sensation_scorer_path save/sensation/512_0.9579935073852539/ -use_s_score 0 -thd 0.0