xinyadu / doc_event_role

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproducing score

spookyQubit opened this issue · comments

Hi @xinyadu , thanks a lot for sharing the code. I am trying to reproduce the scores from Table 1 of the paper. I am grateful for your documentation and followed the steps outlined in this repo as closely as possible and am able to run the experiment all the way through. But the score I get at the end of the experiment does not match the scores reported in Table 1.

The steps I took were:

1. Environment setup/download spacy model/download

$ git clone https://github.com/xinyadu/doc_event_role.git
$ cd doc_event_role
$ touch environment.yml  # add the requirements in this file
$ cat environment.yml 
name: muc_seq_acl
channels:
  - defaults
dependencies:
  - python==3.5.6
  - pip==20.1.1
  - spacy==2.0.12
  - cudatoolkit==9.2
  - pip:
    - torch==0.4.1
    - pytorch-pretrained-bert==0.6.2
    - typing==3.7.4.3
$ conda env create -f environment.yml
$ conda activate muc_seq_acl

2. Download spacy

$ python3 -m spacy download en_core_web_sm

3. Download Glove

Download glove.6B.100d.txt in doc_event_role/model/code/utils/glove.6B.100d.txt

4. Train

$ cd model/code
$ mkdir data_seq_tag_pairs
$ python gen_seq_tag_pairs.py --div train
$ mkdir model_save
$ mkdir model_out
$ python main.py --config config/example.config
$ python seq_to_extracts.py --seqfile model_out/multi_bert.out  # Generate preds.json

The Dev scores I see during the training are:

Dev: time: 193.15s, speed: 15.20st/s; p: 28.1324, r: 70.0204, f: 40.1383
Dev: time: 124.92s, speed: 15.35st/s; p: 62.9229, r: 39.4083, f: 48.4639
Dev: time: 124.47s, speed: 15.36st/s; p: 66.1268, r: 43.9025, f: 52.7702
Dev: time: 133.18s, speed: 15.39st/s; p: 52.8802, r: 66.1970, f: 58.7939
Dev: time: 126.86s, speed: 15.37st/s; p: 63.4102, r: 48.3327, f: 54.8542
Dev: time: 124.93s, speed: 15.33st/s; p: 65.0108, r: 44.1431, f: 52.5823
Dev: time: 123.39s, speed: 15.36st/s; p: 74.8237, r: 37.4043, f: 49.8758
Dev: time: 125.21s, speed: 15.34st/s; p: 72.0788, r: 46.6892, f: 56.6702
Dev: time: 131.71s, speed: 15.31st/s; p: 57.4742, r: 69.6806, f: 62.9915
Dev: time: 129.72s, speed: 15.36st/s; p: 60.7873, r: 62.0204, f: 61.3977
Dev: time: 136.96s, speed: 15.33st/s; p: 47.4962, r: 73.2321, f: 57.6211
Dev: time: 128.23s, speed: 15.33st/s; p: 63.9869, r: 58.7949, f: 61.2811
Dev: time: 131.07s, speed: 15.36st/s; p: 58.6679, r: 62.1833, f: 60.3745
Dev: time: 126.07s, speed: 15.35st/s; p: 65.7156, r: 50.6334, f: 57.1970
Dev: time: 123.92s, speed: 15.33st/s; p: 72.3295, r: 41.9134, f: 53.0724

5. Test scores

$ python eval.py --goldfile ./data/processed/test.json --predfile ./model/code/pred.json

================Exact Match=================
Prec, Recall, F-1
PerpInd
55.1020 39.8649 46.2611
PerpOrg
40.9091 57.1429 47.6821
Target
45.3744 63.4483 52.9105
Victim
46.5753 71.5789 56.4315
Weapon
44.8980 72.1311 55.3459
MACRO average:
46.5718 60.8332 52.7557
===============Head Noun Match===============
Prec, Recall, F-1
PerpInd
59.5745 42.5676 49.6552
PerpOrg
47.7612 65.4762 55.2330
Target
60.2804 78.6207 68.2397
Victim
49.6503 72.6316 58.9815
Weapon
48.9362 75.4098 59.3548
MACRO average:
53.2405 66.9412 59.309

What I am not sure about is, does this score correspond to the Multi-Granularity Reader score from Table 1 of the paper or does this correspond to some other row? Was the config file used to run the Multi-Granularity Reader experiment different from example.config?

II have the same question, is your problem solved?

The score should correspond to the model proposed in the paper.

Why f1 scores, precision and recall are coming as -1? I ran the code for just one iteration.

Seed num: 42
MODEL: train
Load pretrained word embedding, norm: False, dir: utils/glove.6B.100d.txt
Embedding:
     pretrain word:400000, prefect match:9915, case_match:0, oov:2271, oov%:0.18634610650693362
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
 I/O:
     Start   Sequence   Laebling   task...
     Tag          scheme: BIO
     Split         token:  ||| 
     MAX SENTENCE LENGTH: 200
     MAX   WORD   LENGTH: -1
     Number   normalized: False
     Word  alphabet size: 12187
     Char  alphabet size: 53
     Label alphabet size: 12
     Word embedding  dir: utils/glove.6B.100d.txt
     Char embedding  dir: None
     Word embedding size: 100
     Char embedding size: 30
     Norm   word     emb: False
     Norm   char     emb: False
     Train  file directory: ../data_seq_tag_pairs/train_full
     Dev    file directory: ../data_seq_tag_pairs/dev_full
     Test   file directory: ../data_seq_tag_pairs/test
     Raw    file directory: None
     Dset   file directory: model_save/multi_bert.dset
     Model  file directory: model_save/multi_bert
     Loadmodel   directory: model_save/multi_bert.best.model
     Decode file directory: model_out/multi_bert.out
     Train instance number: 9409
     Dev   instance number: 0
     Test  instance number: 1112
     Raw   instance number: 0
     FEATURE num: 0
 ++++++++++++++++++++++++++++++++++++++++
 Model Network:
     Model        use_crf: True
     Model word extractor: LSTM
     Model       use_char: False
 ++++++++++++++++++++++++++++++++++++++++
 Training:
     Optimizer: SGD
     Iteration: 1
     BatchSize: 5
     Average  batch   loss: False
 ++++++++++++++++++++++++++++++++++++++++
 Hyperparameters:
     Hyper              lr: 0.015
     Hyper        lr_decay: 0.05
     Hyper         HP_clip: None
     Hyper        momentum: 0.0
     Hyper              l2: 1e-08
     Hyper      hidden_dim: 200
     Hyper         dropout: 0.4
     Hyper      lstm_layer: 1
     Hyper          bilstm: True
     Hyper             GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build sequence labeling network...
word feature extractor:  LSTM
use crf:  True
build word sequence feature extractor: LSTM...
build word representation...
100% 407873900/407873900 [00:07<00:00, 56807815.17B/s]
build CRF...

Epoch: 0/1
 Learning rate is set as: 0.015
Shuffle: first input word list: [2, 379, 761, 9, 607, 920, 542, 830, 2393, 7616, 470, 38, 410, 165, 101, 161, 47, 1919, 30, 751, 49, 33, 1351, 103, 185, 34, 119, 120, 42, 547, 2, 800, 101, 542, 8, 473, 378, 38, 7616, 136, 137, 1491, 538, 9, 257, 631, 412, 10751, 47, 410, 3866, 42, 3395, 3107, 5459, 47, 410, 741, 2464, 82, 10752, 42, 82, 103, 9774, 3549, 38, 412]
/content/drive/MyDrive/LDP/Granularity/doc_event_role/model/code/model/crf.py:92: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/IndexingUtils.h:28.)
  masked_cur_partition = cur_partition.masked_select(mask_idx)
/content/drive/MyDrive/LDP/Granularity/doc_event_role/model/code/model/crf.py:97: UserWarning: masked_scatter_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/cuda/IndexKernel.cpp:62.)
  partition.masked_scatter_(mask_idx, masked_cur_partition)
/content/drive/MyDrive/LDP/Granularity/doc_event_role/model/code/model/crf.py:246: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/IndexingUtils.h:28.)
  tg_energy = tg_energy.masked_select(mask.transpose(1,0))
/content/drive/MyDrive/LDP/Granularity/doc_event_role/model/code/model/crf.py:159: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/cuda/Indexing.cu:967.)
  cur_bp.masked_fill_(mask[idx].view(batch_size, 1).expand(batch_size, tag_size), 0)
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py:175: UserWarning: masked_scatter_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/cuda/IndexKernel.cpp:62.)
  allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py:175: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/cuda/Indexing.cu:967.)
  allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py:175: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  ../aten/src/ATen/native/IndexingUtils.h:28.)
  allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
     Instance: 500; Time: 41.48s; loss: 33301.9307; acc: 32481.0/35907.0=0.9046
     Instance: 1000; Time: 43.34s; loss: 17721.0560; acc: 67413.0/73419.0=0.9182
     Instance: 1500; Time: 41.94s; loss: 9585.8511; acc: 101886.0/109830.0=0.9277
     Instance: 2000; Time: 41.54s; loss: 6581.4640; acc: 135592.0/145601.0=0.9313
     Instance: 2500; Time: 42.53s; loss: 4454.5928; acc: 170963.0/182673.0=0.9359
     Instance: 3000; Time: 43.69s; loss: 3632.2205; acc: 206277.0/219703.0=0.9389
     Instance: 3500; Time: 41.45s; loss: 3705.0391; acc: 240547.0/255728.0=0.9406
     Instance: 4000; Time: 42.23s; loss: 3311.3256; acc: 275859.0/292731.0=0.9424
     Instance: 4500; Time: 43.09s; loss: 3200.9517; acc: 309967.0/328569.0=0.9434
     Instance: 5000; Time: 41.54s; loss: 3095.8545; acc: 344268.0/364550.0=0.9444
     Instance: 5500; Time: 42.88s; loss: 3549.6006; acc: 379845.0/402111.0=0.9446
     Instance: 6000; Time: 43.32s; loss: 2930.5935; acc: 414651.0/438744.0=0.9451
     Instance: 6500; Time: 41.57s; loss: 2906.9250; acc: 448489.0/474379.0=0.9454
     Instance: 7000; Time: 42.23s; loss: 2729.6758; acc: 483452.0/511023.0=0.9460
     Instance: 7500; Time: 42.82s; loss: 3256.7528; acc: 518431.0/547845.0=0.9463
     Instance: 8000; Time: 42.00s; loss: 2533.6289; acc: 553023.0/584109.0=0.9468
     Instance: 8500; Time: 41.26s; loss: 2746.0062; acc: 586990.0/619886.0=0.9469
     Instance: 9000; Time: 41.76s; loss: 2765.5623; acc: 621129.0/655800.0=0.9471
     Instance: 9409; Time: 35.17s; loss: 2204.6501; acc: 649257.0/685466.0=0.9472
Epoch: 0 training finished. Time: 795.83s, speed: 11.82st/s,  total loss: 114213.68103027344
totalloss: 114213.68103027344
Dev: time: 4.87s, speed: 0.00st/s; p: -1.0000, r: -1.0000, f: -1.0000
!!!Exceed previous best f score: -10
Save current best model in file: model_save/multi_bert.0.model



MODEL: decode
Load Model from file:  model_save/multi_bert
build sequence labeling network...
word feature extractor:  LSTM
use crf:  True
build word sequence feature extractor: LSTM...
build word representation...
build CRF...
Predict test result has been written into file. model_out/multi_bert.out