dubuisa / abae-pytorch

PyTorch implementation of 'An Unsupervised Neural Attention Model for Aspect Extraction' by He et al. ACL2017'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ABAE-PyTorch

Yet another PyTorch implementation of the model described in the paper An Unsupervised Neural Attention Model for Aspect Extraction by He, Ruidan and Lee, Wee Sun and Ng, Hwee Tou and Dahlmeier, Daniel, ACL2017.

Example

For a working example of a whole pipeline please refer to example_run.sh

Let's get some data:

wget http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz
gunzip reviews_Electronics_5.json.gz    
python3 custom_format_converter.py reviews_Electronics_5.json

Then we need to train the word vectors:

python3 word2vec.py reviews_Electronics_5.json.txt

And run

usage: main.py [-h] [--word-vectors-path <str>] [--batch-size BATCH_SIZE]
               [--aspects-number ASPECTS_NUMBER] [--ortho-reg ORTHO_REG]
               [--epochs EPOCHS] [--optimizer {adam,adagrad,sgd}]
               [--negative-samples NEG_SAMPLES] [--dataset-path DATASET_PATH]
               [--maxlen MAXLEN]

optional arguments:
  -h, --help            show this help message and exit
  --word-vectors-path <str>, -wv <str>
                        path to word vectors file
  --batch-size BATCH_SIZE, -b BATCH_SIZE
                        Batch size for training
  --aspects-number ASPECTS_NUMBER, -as ASPECTS_NUMBER
                        A total number of aspects
  --ortho-reg ORTHO_REG, -orth ORTHO_REG
                        Ortho-regularization impact coefficient
  --epochs EPOCHS, -e EPOCHS
                        Epochs count
  --optimizer {adam,adagrad,sgd}, -opt {adam,adagrad,sgd}
                        Optimizer
  --negative-samples NEG_SAMPLES, -ns NEG_SAMPLES
                        Negative samples per positive one
  --dataset-path DATASET_PATH, -d DATASET_PATH
                        Path to a training texts file. One sentence per line,
                        tokens separated wiht spaces.
  --maxlen MAXLEN, -l MAXLEN
                        Max length of the considered sentence; the rest is
                        clipped if longer

For a working example of a whole pipeline please refer to example_run.sh

I acknowledge the implementation is raw, code modification requests and issues are welcome.

TODOs

  • Evaluation: PMI, NPMI, LCP, L1/L2/coord/cosine (Nikolenko SIGIR'16), ...
  • Aspects prediction on text + visualization
  • Saving the model, aspects, etc.

Please also see

About

PyTorch implementation of 'An Unsupervised Neural Attention Model for Aspect Extraction' by He et al. ACL2017'

License:MIT License


Languages

Language:Python 96.4%Language:Shell 3.6%