f score < 0.01 for conll2003 data
nnakamura3 opened this issue · comments
nnakamura3 commented
Hi, even after 10 epochs, f score < 0.01 and accuracy stay around 0.83.
I checked my data format, and I don't think it is wrong like past issues.
could you give me a hand with this problem?
my config:
### use # to comment out the configure item
### I/O ###
train_dir=dataset/bpe20000/train.BIOES
dev_dir=dataset/bpe20000/valid.BIOES
test_dir=dataset/bpe20000/test.BIOES
model_dir=result/bpe20000.char.BIOES/checkpoint
#word_emb_dir=sample_data/sample.word.emb
#raw_dir=
#decode_dir=
#dset_dir=
#load_model_dir=
#char_emb_dir=
norm_word_emb=False
norm_char_emb=False
number_normalized=True
seg=True
word_emb_dim=50
char_emb_dim=30
###NetworkConfiguration###
use_crf=True
use_char=True
word_seq_feature=LSTM
char_seq_feature=CNN
#feature=[POS] emb_size=20
#feature=[Cap] emb_size=20
#nbest=1
###TrainingSetting###
status=train
optimizer=SGD
iteration=100
batch_size=32
ave_batch_loss=False
###Hyperparameters###
cnn_layer=4
char_hidden_dim=50
hidden_dim=200
dropout=0.5
lstm_layer=1
bilstm=True
learning_rate=0.015
lr_decay=0.05
momentum=0
l2=1e-8
#gpu
#clip=
test data (head -100):
SOCCER O
- O
JAPAN S-LOC
GET O
LUCKY O
WIN O
, O
CHINA S-PER
IN O
SURPRISE O
DEFEAT O
. O
Nadim B-PER
Ladki E-PER
AL-AIN S-LOC
, O
United B-LOC
Arab I-LOC
Emirates E-LOC
1996-12-06 O
train log :
Seed num: 42
MODEL: train
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
I/O:
Start Sequence Laebling task...
Tag scheme: BMES
Split token: |||
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: True
Word alphabet size: 25305
Char alphabet size: 78
Label alphabet size: 18
Word embedding dir: None
Char embedding dir: None
Word embedding size: 50
Char embedding size: 30
Norm word emb: False
Norm char emb: False
Train file directory: data/conll2003/en/ner/train.BIOES.txt
Dev file directory: data/conll2003/en/ner/valid.BIOES.txt
Test file directory: data/conll2003/en/ner/test.BIOES.txt
Raw file directory: None
Dset file directory: None
Model file directory: result/wordbase.char.BIOES/checkpoint
Loadmodel directory: None
Decode file directory: None
Train instance number: 14041
Dev instance number: 3250
Test instance number: 3453
Raw instance number: 0
FEATURE num: 0
++++++++++++++++++++++++++++++++++++++++
Model Network:
Model use_crf: True
Model word extractor: LSTM
Model use_char: True
Model char extractor: CNN
Model char_hidden_dim: 50
++++++++++++++++++++++++++++++++++++++++
Training:
Optimizer: SGD
Iteration: 100
BatchSize: 32
Average batch loss: False
++++++++++++++++++++++++++++++++++++++++
Hyperparameters:
Hyper lr: 0.015
Hyper lr_decay: 0.05
Hyper HP_clip: None
Hyper momentum: 0.0
Hyper l2: 1e-08
Hyper hidden_dim: 200
Hyper dropout: 0.5
Hyper lstm_layer: 1
Hyper bilstm: True
Hyper GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build sequence labeling network...
use_char: True
char feature extractor: CNN
word feature extractor: LSTM
use crf: True
build word sequence feature extractor: LSTM...
build word representation...
build char sequence feature extractor: CNN ...
build CRF...
Epoch: 0/100
Learning rate is set as: 0.015
Shuffle: first input word list: [16808, 16793, 793, 791, 259]
Instance: 4000; Time: 21.95s; loss: 89460.7523; acc: 43336.0/58355.0=0.7426
Instance: 8000; Time: 21.64s; loss: 62598.9025; acc: 87683.0/117335.0=0.7473
Instance: 12000; Time: 21.70s; loss: 53243.5435; acc: 132374.0/175600.0=0.7538
Instance: 14041; Time: 10.67s; loss: 27011.2681; acc: 153735.0/203621.0=0.7550
Epoch: 0 training finished. Time: 75.95s, speed: 184.87st/s, total loss: 232314.4663696289
totalloss: 232314.4663696289
Right token = 42760 All token = 51362 acc = 0.8325220980491413
Dev: time: 4.65s, speed: 707.92st/s; acc: 0.8325, p: 0.4062, r: 0.0022, f: 0.0044
Exceed previous best f score: -10
Save current best model in file: result/wordbase.char.BIOES/checkpoint.0.model
Right token = 38308 All token = 46435 acc = 0.82498115645526
Test: time: 4.09s, speed: 856.69st/s; acc: 0.8250, p: 0.2609, r: 0.0021, f: 0.0042
Epoch: 1/100
Learning rate is set as: 0.014285714285714285
Shuffle: first input word list: [848, 192, 62, 5572, 6562, 131, 9809, 163, 41, 7800, 5131, 133, 72, 59, 15811, 6342, 6262, 6, 855, 4836, 72, 323, 57, 3834, 144, 13465, 39, 1197, 537, 1562, 303, 57, 2891, 10]
Instance: 4000; Time: 21.92s; loss: 53488.2458; acc: 44460.0/57848.0=0.7686
Instance: 8000; Time: 20.83s; loss: 55212.6742; acc: 88519.0/115384.0=0.7672
Instance: 12000; Time: 21.17s; loss: 54842.6556; acc: 133692.0/173869.0=0.7689
Instance: 14041; Time: 10.56s; loss: 27046.8551; acc: 157028.0/203621.0=0.7712
Epoch: 1 training finished. Time: 74.48s, speed: 188.52st/s, total loss: 190590.4307861328
totalloss: 190590.4307861328
Right token = 42762 All token = 51362 acc = 0.8325610373427826
Dev: time: 4.90s, speed: 668.88st/s; acc: 0.8326, p: 0.6667, r: 0.0007, f: 0.0013
Right token = 38327 All token = 46435 acc = 0.8253903305696134
Test: time: 4.45s, speed: 784.95st/s; acc: 0.8254, p: 0.5000, r: 0.0007, f: 0.0014
Epoch: 2/100
Learning rate is set as: 0.013636363636363634
Shuffle: first input word list: [269, 14]
Instance: 4000; Time: 21.66s; loss: 55451.4020; acc: 43960.0/57661.0=0.7624
Instance: 8000; Time: 21.08s; loss: 59527.4275; acc: 87524.0/114649.0=0.7634
Instance: 12000; Time: 20.77s; loss: 69982.1991; acc: 130951.0/173361.0=0.7554
Instance: 14041; Time: 10.91s; loss: 29146.8357; acc: 153873.0/203621.0=0.7557
Epoch: 2 training finished. Time: 74.41s, speed: 188.71st/s, total loss: 214107.8642578125
totalloss: 214107.8642578125
Right token = 42757 All token = 51362 acc = 0.8324636891086795
Dev: time: 4.53s, speed: 724.57st/s; acc: 0.8325, p: 0.1667, r: 0.0002, f: 0.0003
Right token = 38323 All token = 46435 acc = 0.8253041886508022
Test: time: 4.51s, speed: 775.19st/s; acc: 0.8253, p: 0.0000, r: 0.0000, f: -1.0000
Epoch: 3/100
Learning rate is set as: 0.013043478260869566
Shuffle: first input word list: [1919, 72, 1373, 1920, 14]
Instance: 4000; Time: 21.04s; loss: 47456.8503; acc: 46273.0/58518.0=0.7907
Instance: 8000; Time: 21.17s; loss: 51338.7653; acc: 90825.0/116563.0=0.7792
Instance: 12000; Time: 20.69s; loss: 49514.3057; acc: 135771.0/174081.0=0.7799
Instance: 14041; Time: 10.49s; loss: 26523.3854; acc: 158471.0/203621.0=0.7783
Epoch: 3 training finished. Time: 73.39s, speed: 191.33st/s, total loss: 174833.306640625
totalloss: 174833.306640625
Right token = 42752 All token = 51362 acc = 0.8323663408745765
Dev: time: 4.38s, speed: 749.43st/s; acc: 0.8324, p: 0.0714, r: 0.0002, f: 0.0003
Right token = 38315 All token = 46435 acc = 0.8251319048131797
Test: time: 3.18s, speed: 1101.28st/s; acc: 0.8251, p: 0.0435, r: 0.0002, f: 0.0004
Epoch: 4/100
Learning rate is set as: 0.0125
Shuffle: first input word list: [7531, 14]
Instance: 4000; Time: 21.25s; loss: 45186.5659; acc: 45306.0/57892.0=0.7826
Instance: 8000; Time: 21.48s; loss: 47879.9988; acc: 90799.0/116218.0=0.7813
Instance: 12000; Time: 20.97s; loss: 47165.0463; acc: 136038.0/173760.0=0.7829
Instance: 14041; Time: 10.98s; loss: 25187.9487; acc: 159160.0/203621.0=0.7816
Epoch: 4 training finished. Time: 74.67s, speed: 188.04st/s, total loss: 165419.5596923828
totalloss: 165419.5596923828
Right token = 42754 All token = 51362 acc = 0.8324052801682178
Dev: time: 3.57s, speed: 921.35st/s; acc: 0.8324, p: 0.2143, r: 0.0005, f: 0.0010
Right token = 38301 All token = 46435 acc = 0.8248304080973403
Test: time: 3.58s, speed: 983.20st/s; acc: 0.8248, p: 0.1515, r: 0.0009, f: 0.0018
Epoch: 5/100
Learning rate is set as: 0.012
Shuffle: first input word list: [59, 135, 1594, 6966, 35, 259, 94, 2492, 87, 90, 39, 793, 1596, 4009, 10]
Instance: 4000; Time: 20.87s; loss: 47385.0153; acc: 45626.0/58632.0=0.7782
Instance: 8000; Time: 21.45s; loss: 44408.2977; acc: 90903.0/115930.0=0.7841
Instance: 12000; Time: 20.68s; loss: 46127.7147; acc: 136152.0/173487.0=0.7848
Instance: 14041; Time: 10.89s; loss: 24793.5476; acc: 159487.0/203621.0=0.7833
Epoch: 5 training finished. Time: 73.89s, speed: 190.03st/s, total loss: 162714.5753173828
totalloss: 162714.5753173828
Right token = 42760 All token = 51362 acc = 0.8325220980491413
Dev: time: 4.05s, speed: 813.70st/s; acc: 0.8325, p: 0.5000, r: 0.0003, f: 0.0007
Right token = 38331 All token = 46435 acc = 0.8254764724884247
Test: time: 3.78s, speed: 923.25st/s; acc: 0.8255, p: 0.7273, r: 0.0014, f: 0.0028
Epoch: 6/100
Learning rate is set as: 0.011538461538461537
Shuffle: first input word list: [2822, 259, 259, 2820, 259, 1006]
Instance: 4000; Time: 21.46s; loss: 45836.3317; acc: 45602.0/57898.0=0.7876
Instance: 8000; Time: 20.92s; loss: 40953.8358; acc: 90251.0/114021.0=0.7915
Instance: 12000; Time: 20.88s; loss: 50871.9891; acc: 137183.0/173830.0=0.7892
Instance: 14041; Time: 10.71s; loss: 25519.3202; acc: 160047.0/203621.0=0.7860
Epoch: 6 training finished. Time: 73.96s, speed: 189.85st/s, total loss: 163181.47680664062
totalloss: 163181.47680664062
Right token = 42759 All token = 51362 acc = 0.8325026284023208
Dev: time: 4.37s, speed: 753.98st/s; acc: 0.8325, p: 0.4000, r: 0.0003, f: 0.0007
Right token = 38329 All token = 46435 acc = 0.825433401529019
Test: time: 3.93s, speed: 887.74st/s; acc: 0.8254, p: 0.7500, r: 0.0011, f: 0.0021
Epoch: 7/100
Learning rate is set as: 0.01111111111111111
Shuffle: first input word list: [4516, 259, 259, 657, 657, 513, 513, 259]
Instance: 4000; Time: 21.09s; loss: 43271.3885; acc: 47082.0/58467.0=0.8053
Instance: 8000; Time: 21.53s; loss: 45434.3374; acc: 92604.0/116288.0=0.7963
Instance: 12000; Time: 20.87s; loss: 44401.8308; acc: 138572.0/174221.0=0.7954
Instance: 14041; Time: 10.73s; loss: 20354.4045; acc: 162068.0/203621.0=0.7959
Epoch: 7 training finished. Time: 74.23s, speed: 189.15st/s, total loss: 153461.96130371094
totalloss: 153461.96130371094
Right token = 42754 All token = 51362 acc = 0.8324052801682178
Dev: time: 3.75s, speed: 879.98st/s; acc: 0.8324, p: 0.2000, r: 0.0005, f: 0.0010
Right token = 38318 All token = 46435 acc = 0.8251965112522881
Test: time: 3.95s, speed: 883.15st/s; acc: 0.8252, p: 0.4286, r: 0.0016, f: 0.0032
Epoch: 8/100
Learning rate is set as: 0.010714285714285714
Shuffle: first input word list: [83, 18, 474, 237, 11590, 484, 237, 41, 1338, 250, 57, 259, 260, 3554, 3555, 203, 21, 57, 5553, 6, 1181, 363, 8919, 10]
Instance: 4000; Time: 20.31s; loss: 40530.5502; acc: 47295.0/58444.0=0.8092
Instance: 8000; Time: 21.16s; loss: 43322.9246; acc: 93888.0/117047.0=0.8021
Instance: 12000; Time: 22.01s; loss: 43372.3680; acc: 139805.0/174641.0=0.8005
Instance: 14041; Time: 10.78s; loss: 22910.1526; acc: 162543.0/203621.0=0.7983
Epoch: 8 training finished. Time: 74.26s, speed: 189.08st/s, total loss: 150135.99536132812
totalloss: 150135.99536132812
Right token = 42751 All token = 51362 acc = 0.832346871227756
Dev: time: 4.28s, speed: 770.60st/s; acc: 0.8323, p: 0.0000, r: 0.0000, f: -1.0000
Right token = 38315 All token = 46435 acc = 0.8251319048131797
Test: time: 4.37s, speed: 802.09st/s; acc: 0.8251, p: 0.0526, r: 0.0002, f: 0.0004
Epoch: 9/100
Learning rate is set as: 0.010344827586206896
Shuffle: first input word list: [1870, 6715, 6716, 131, 6717, 133, 579]
Instance: 4000; Time: 20.10s; loss: 42776.5460; acc: 46757.0/58147.0=0.8041
Instance: 8000; Time: 21.18s; loss: 45972.1129; acc: 92088.0/115892.0=0.7946
Instance: 12000; Time: 21.80s; loss: 39922.2743; acc: 139043.0/173857.0=0.7998
Instance: 14041; Time: 10.79s; loss: 20447.1543; acc: 163247.0/203621.0=0.8017
Epoch: 9 training finished. Time: 73.87s, speed: 190.07st/s, total loss: 149118.08752441406
totalloss: 149118.08752441406
Right token = 42754 All token = 51362 acc = 0.8324052801682178
Dev: time: 4.08s, speed: 807.30st/s; acc: 0.8324, p: 0.0000, r: 0.0000, f: -1.0000
Right token = 38322 All token = 46435 acc = 0.8252826531710994
Test: time: 4.20s, speed: 829.47st/s; acc: 0.8253, p: 0.0909, r: 0.0002, f: 0.0004
Epoch: 10/100
Learning rate is set as: 0.01
Shuffle: first input word list: [2567, 39, 513, 1858, 6605]
Instance: 4000; Time: 20.74s; loss: 44515.7240; acc: 45471.0/57824.0=0.7864
Instance: 8000; Time: 21.38s; loss: 43261.1500; acc: 91942.0/116208.0=0.7912
Instance: 12000; Time: 20.97s; loss: 41170.6157; acc: 138384.0/173883.0=0.7958
Instance: 14041; Time: 10.84s; loss: 19026.5566; acc: 162906.0/203621.0=0.8000
Epoch: 10 training finished. Time: 73.93s, speed: 189.93st/s, total loss: 147974.04638671875
totalloss: 147974.04638671875
Right token = 42759 All token = 51362 acc = 0.8325026284023208
Dev: time: 4.90s, speed: 668.88st/s; acc: 0.8325, p: 0.5000, r: 0.0003, f: 0.0007
Right token = 38329 All token = 46435 acc = 0.825433401529019
Test: time: 3.81s, speed: 916.40st/s; acc: 0.8254, p: 0.7500, r: 0.0011, f: 0.0021
Jie Yang commented
That's weird. Seems like the model didn' train well, how about set batch_size=10
, have you tried this?
nnakamura3 commented
oh, I forgot changing batch_size
.
I tried batch_size=10
and found that the model was working properly.
Thank you for your advice!