Semantic Autoencoder (SemAE)

This repository presents the implementation of the ACL 2022 paper:

Unsupervised Extractive Opinion Summarization Using Sparse Coding,
Somnath Basu Roy Chowdhury, Chao Zhao and Snigdha Chaturvedi

The implementation of SemAE is based on the open-source framework of Quantized Transformer.

Data

Download the SPACE corpus from this link. Amazon dataset is publicly available here.

For Amazon dataset, the data was processed using instruction from here.

To directly access the data used in our experiments, use the files in this link as the data/ folder. Please cite the respective papers if you are using the above datasets.

Using our model

Setting up the environment

Python version: python3.6
Dependencies: Use the requirements.txt file and conda/pip to install all necessary dependencies. E.g., for pip:
```
  pip install -U pip
  pip install -U setuptools
  pip install -r requirements.txt 
```

Training SemAE

To train SemAE on a subset of the training set using a GPU, go to the ./src directory and run the following:

python3 train.py --max_num_entities 500 --run_id space_run --gpu 0

This will train a SemAE model with default hyperparameters (for general summarization), store tensorboard logs under ./logs and save a model snapshot after every epoch under ./models (filename: space_run_<epoch>_model.pt).

For training the full model on SPACE, run the following:

cd scripts/
chmod +x train_space.sh
./train_space.sh

For training the model on full Amazon dataset, please run scripts/train_amazon.sh bash script in a similar manner.

Summarization with SemAE

To perform general opinion summarization with a trained SemAE model, go to the ./src directory and run the following:

python3 inference.py \
		--model ../models/space_run_10_model.pt \
		--run_id space_run \
		--gpu 0

This will store the summaries under ./outputs/space_run and also the output of ROUGE evaluation in ./outputs/eval_space_run.json. For aspect opinion summarization, run:

python3 aspect_inference.py \
		--model ../models/space_run_10_model.pt \
		--sample_sentences --run_id aspects_run \
		--gpu 0

The summarization scripts for SPACE and Amazon are: scripts/evaluate_*.sh

Citation

@inproceedings{chowdhury2022unsupervised,
    title = {Unsupervised Extractive Opinion Summarization Using Sparse Coding},
    author = {Basu Roy Chowdhury, Somnath  and
      Zhao, Chao  and
      Chaturvedi, Snigdha},
    booktitle = {ACL},
    year = {2022},
}

brcsomnath / SemAE