aqweteddy / LeverageJustAFewKeywords

Unofficial implementation for EMNLP2019 <Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training>

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Traing

  • original paper
  • I can't find the official implementation or any unofficial implementation.

Dataset

requirements

  • pytorch >= 1.5
  • numpy
  • h5py
  • click

How to Run

preprocess data

  • preprocess data follow OPOSUM
    • hdf5 (train data and test data)
    • seed words
  • extract preprocessed hdf5 data through through extract_data.py
python extract_data.py --source oposum/data/preprocessed/BOOTS.hdf5 --output data/boots_train.json
python extract_data.py --source oposum/data/preprocessed/BOOTS_TEST.hdf5 --output data/boots_test.json

train

  • you can set config in config.py or using arguments.
    • notice that the general aspect index is not same in every datasets.
  • start training
    • seed_words is no weight.
python parser.py --train_file ./data/boots_train.json --test_file ./data/boots_test.json --save_dir ./ckpt/boots --aspect_init_file ./data/seed_words.txt --epochs 3
  • python parser.py --help to see detail.

Benchmark

OPOSUM

  • Bags: 0.59 (RandomSampler 50000 data run 3 epochs.)
  • TV:

Some difference between the paper and this implementation

RandomSampler

  • random sample 50000 data every epochs.

Teacher

  • In paper sec.(3.1),

If no seed word appears in s, then the teacher predicts the "General" aspect by setting $q_i^k = 1$

but in this implementation,

If no seed word appears in s, I let the teacher predicts like the seedword of general aspect appear one time.

About

Unofficial implementation for EMNLP2019 <Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training>


Languages

Language:Python 100.0%