McSinyx / viwikipi

Vietnamese Wikipedia Paraphase Identity experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmarks for different pretrained models

McSinyx opened this issue · comments

Please comment the benchmarks (F1, accuracy, loss, etc.) here along with instructions to reproduce these results. Training time would also be helpful.

Models to try out:

  • bert-base-multilingual-cased
  • xlm-mlm-17-1280

I believe to achieve remarkably better accuracy, we'll need to do some tweaking/addition to the current training data though.

With xlm-mlm-17-1280 model, the GTX 1080 8GB could not load this model, even with batch_size = 2. If you want to run this model, using TPU is recommended!

Two other XLM models return an error: "F-score is ill-defined and being set to 0.0" - they assign everything into ONE class. The reason might be the language was not defined yet => Have to set language to Vietnamese before training!

It does not seem TPU can load xlm-mlm-17-1280 either. It'd be a bit strange if MLM/TLM requires specifying language though, since they are created especially for dealing with multilingual text.

Bert overfit very well (?), but the F1 on the dev set does not get improved over time.

Here are the log of 69 epochs

2019-11-20 16:18:56.801 | DEBUG    | __main__:<module>:485 - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False
2019-11-20 16:19:31.480 | DEBUG    | __main__:<module>:520 - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir=None, config=None, cuda=True, data_dir='mrpc', device=device(type=
'cuda'), eval=True, eval_all_checkpoints=False, eval_while_train=True, gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, log_file='bert/2019-11-20T16:18:53+00:00.log', logging_steps
=None, lower_case=False, max_grad_norm=1.0, max_seq=128, max_steps=-1, model='bert-base-multilingual-cased', model_type='bert', n_gpu=1, num_train_epochs=69.0, output_dir='bert', output_mode='classi
fication', overwrite_cache=True, overwrite_output_dir=True, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=32, save_steps=None, seed=42, server_ip=None, server_port=None, task='mrpc', tokenizer
=None, train=True, warmup_steps=0, weight_decay=0.0)                                                                                                                                                 
2019-11-20 16:22:25.195 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-314 *****
2019-11-20 16:22:25.196 | DEBUG    | __main__:evaluate:285 - acc = 0.8155940594059405
2019-11-20 16:22:25.196 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7418727535337053
2019-11-20 16:22:25.196 | DEBUG    | __main__:evaluate:285 - f1 = 0.6681514476614699
2019-11-20 16:22:25.198 | DEBUG    | __main__:train:184 - global_step = 314, average loss = 130.866460300982
2019-11-20 16:25:12.084 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-628 *****
2019-11-20 16:25:12.085 | DEBUG    | __main__:evaluate:285 - acc = 0.8417904290429042
2019-11-20 16:25:12.085 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7588757806389004
2019-11-20 16:25:12.085 | DEBUG    | __main__:evaluate:285 - f1 = 0.6759611322348964
2019-11-20 16:25:12.087 | DEBUG    | __main__:train:184 - global_step = 628, average loss = 214.11896255612373
2019-11-20 16:27:59.503 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-942 *****
2019-11-20 16:27:59.503 | DEBUG    | __main__:evaluate:285 - acc = 0.8428217821782178
2019-11-20 16:27:59.503 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7459965264482249
2019-11-20 16:27:59.504 | DEBUG    | __main__:evaluate:285 - f1 = 0.649171270718232
2019-11-20 16:27:59.506 | DEBUG    | __main__:train:184 - global_step = 942, average loss = 270.4700103774667
2019-11-20 16:30:46.416 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-1256 *****
2019-11-20 16:30:46.417 | DEBUG    | __main__:evaluate:285 - acc = 0.846947194719472
2019-11-20 16:30:46.417 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7534460996512117
2019-11-20 16:30:46.417 | DEBUG    | __main__:evaluate:285 - f1 = 0.6599450045829514
2019-11-20 16:30:46.419 | DEBUG    | __main__:train:184 - global_step = 1256, average loss = 311.1270676050335
2019-11-20 16:33:33.099 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-1570 *****
2019-11-20 16:33:33.099 | DEBUG    | __main__:evaluate:285 - acc = 0.8397277227722773
2019-11-20 16:33:33.100 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7463488993495151
2019-11-20 16:33:33.100 | DEBUG    | __main__:evaluate:285 - f1 = 0.6529700759267529
2019-11-20 16:33:33.101 | DEBUG    | __main__:train:184 - global_step = 1570, average loss = 347.3645346928388
2019-11-20 16:36:20.095 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-1884 *****
2019-11-20 16:36:20.096 | DEBUG    | __main__:evaluate:285 - acc = 0.8399339933993399
2019-11-20 16:36:20.096 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7478285761140444
2019-11-20 16:36:20.096 | DEBUG    | __main__:evaluate:285 - f1 = 0.6557231588287488
2019-11-20 16:36:20.098 | DEBUG    | __main__:train:184 - global_step = 1884, average loss = 377.76026828959584
2019-11-20 16:39:06.362 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-2198 *****
2019-11-20 16:39:06.362 | DEBUG    | __main__:evaluate:285 - acc = 0.8417904290429042
2019-11-20 16:39:06.362 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7474442873391816
2019-11-20 16:39:06.363 | DEBUG    | __main__:evaluate:285 - f1 = 0.653098145635459
2019-11-20 16:39:06.366 | DEBUG    | __main__:train:184 - global_step = 2198, average loss = 402.24665041058324
2019-11-20 16:41:52.641 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-2512 *****
2019-11-20 16:41:52.641 | DEBUG    | __main__:evaluate:285 - acc = 0.8450907590759076
2019-11-20 16:41:52.641 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.746171495085159
2019-11-20 16:41:52.642 | DEBUG    | __main__:evaluate:285 - f1 = 0.6472522310944105
2019-11-20 16:41:52.643 | DEBUG    | __main__:train:184 - global_step = 2512, average loss = 423.558560061967
2019-11-20 16:44:39.135 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-2826 *****
2019-11-20 16:44:39.135 | DEBUG    | __main__:evaluate:285 - acc = 0.8331270627062707
2019-11-20 16:44:39.135 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7456721076100006
2019-11-20 16:44:39.135 | DEBUG    | __main__:evaluate:285 - f1 = 0.6582171525137305
2019-11-20 16:44:39.137 | DEBUG    | __main__:train:184 - global_step = 2826, average loss = 441.15046172053553
2019-11-20 16:47:25.462 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-3140 *****
2019-11-20 16:47:25.462 | DEBUG    | __main__:evaluate:285 - acc = 0.8310643564356436
2019-11-20 16:47:25.463 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.748457514815129
2019-11-20 16:47:25.463 | DEBUG    | __main__:evaluate:285 - f1 = 0.6658506731946144
2019-11-20 16:47:25.465 | DEBUG    | __main__:train:184 - global_step = 3140, average loss = 457.6777255741181
2019-11-20 16:50:11.674 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-3454 *****
2019-11-20 16:50:11.675 | DEBUG    | __main__:evaluate:285 - acc = 0.840552805280528
2019-11-20 16:50:11.675 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7445146927721421
2019-11-20 16:50:11.675 | DEBUG    | __main__:evaluate:285 - f1 = 0.6484765802637562
2019-11-20 16:50:11.677 | DEBUG    | __main__:train:184 - global_step = 3454, average loss = 471.82956717647903
2019-11-20 16:52:57.966 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-3768 *****
2019-11-20 16:52:57.966 | DEBUG    | __main__:evaluate:285 - acc = 0.8360148514851485
2019-11-20 16:52:57.966 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7269934901346935
2019-11-20 16:52:57.966 | DEBUG    | __main__:evaluate:285 - f1 = 0.6179721287842385
2019-11-20 16:52:57.968 | DEBUG    | __main__:train:184 - global_step = 3768, average loss = 483.93517398760014
2019-11-20 16:55:44.281 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-4082 *****
2019-11-20 16:55:44.281 | DEBUG    | __main__:evaluate:285 - acc = 0.8382838283828383
2019-11-20 16:55:44.281 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7562158959038049
2019-11-20 16:55:44.281 | DEBUG    | __main__:evaluate:285 - f1 = 0.6741479634247715
2019-11-20 16:55:44.283 | DEBUG    | __main__:train:184 - global_step = 4082, average loss = 496.52013974505826
2019-11-20 16:58:30.621 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-4396 *****
2019-11-20 16:58:30.621 | DEBUG    | __main__:evaluate:285 - acc = 0.8485973597359736
2019-11-20 16:58:30.622 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7598721565704958
2019-11-20 16:58:30.622 | DEBUG    | __main__:evaluate:285 - f1 = 0.671146953405018
2019-11-20 16:58:30.624 | DEBUG    | __main__:train:184 - global_step = 4396, average loss = 509.6342240416998
2019-11-20 17:01:17.804 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-4710 *****
2019-11-20 17:01:17.804 | DEBUG    | __main__:evaluate:285 - acc = 0.8452970297029703
2019-11-20 17:01:17.804 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7487153052039899
2019-11-20 17:01:17.805 | DEBUG    | __main__:evaluate:285 - f1 = 0.6521335807050094
2019-11-20 17:01:17.806 | DEBUG    | __main__:train:184 - global_step = 4710, average loss = 525.1504652922595
2019-11-20 17:04:04.141 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-5024 *****
2019-11-20 17:04:04.141 | DEBUG    | __main__:evaluate:285 - acc = 0.8424092409240924
2019-11-20 17:04:04.141 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7530708176451448
2019-11-20 17:04:04.142 | DEBUG    | __main__:evaluate:285 - f1 = 0.6637323943661972
2019-11-20 17:04:04.144 | DEBUG    | __main__:train:184 - global_step = 5024, average loss = 535.4719151119789
2019-11-20 17:06:51.118 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-5338 *****
2019-11-20 17:06:51.119 | DEBUG    | __main__:evaluate:285 - acc = 0.8364273927392739
2019-11-20 17:06:51.119 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7461963427904612
2019-11-20 17:06:51.119 | DEBUG    | __main__:evaluate:285 - f1 = 0.6559652928416486
2019-11-20 17:06:51.121 | DEBUG    | __main__:train:184 - global_step = 5338, average loss = 542.7603054482461
2019-11-20 17:09:37.232 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-5652 *****
2019-11-20 17:09:37.232 | DEBUG    | __main__:evaluate:285 - acc = 0.835602310231023
2019-11-20 17:09:37.232 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.750011681431301
2019-11-20 17:09:37.233 | DEBUG    | __main__:evaluate:285 - f1 = 0.664421052631579
2019-11-20 17:09:37.234 | DEBUG    | __main__:train:184 - global_step = 5652, average loss = 550.4372036239511
2019-11-20 17:12:23.414 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-5966 *****
2019-11-20 17:12:23.414 | DEBUG    | __main__:evaluate:285 - acc = 0.8335396039603961
2019-11-20 17:12:23.414 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7293568897646847
2019-11-20 17:12:23.414 | DEBUG    | __main__:evaluate:285 - f1 = 0.6251741755689735
2019-11-20 17:12:23.428 | DEBUG    | __main__:train:184 - global_step = 5966, average loss = 559.5997352732084
2019-11-20 17:15:09.623 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-6280 *****
2019-11-20 17:15:09.623 | DEBUG    | __main__:evaluate:285 - acc = 0.846947194719472
2019-11-20 17:15:09.623 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.759604339409206
2019-11-20 17:15:09.623 | DEBUG    | __main__:evaluate:285 - f1 = 0.6722614840989399
2019-11-20 17:15:09.625 | DEBUG    | __main__:train:184 - global_step = 6280, average loss = 569.7800986047805
2019-11-20 17:17:55.712 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-6594 *****
2019-11-20 17:17:55.712 | DEBUG    | __main__:evaluate:285 - acc = 0.8358085808580858
2019-11-20 17:17:55.712 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7467950900851049
2019-11-20 17:17:55.712 | DEBUG    | __main__:evaluate:285 - f1 = 0.657781599312124
2019-11-20 17:17:55.714 | DEBUG    | __main__:train:184 - global_step = 6594, average loss = 577.7438732692535
2019-11-20 17:20:42.096 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-6908 *****
2019-11-20 17:20:42.096 | DEBUG    | __main__:evaluate:285 - acc = 0.8261138613861386
2019-11-20 17:20:42.097 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7442543676173785
2019-11-20 17:20:42.097 | DEBUG    | __main__:evaluate:285 - f1 = 0.6623948738486183
2019-11-20 17:20:42.098 | DEBUG    | __main__:train:184 - global_step = 6908, average loss = 585.9919928581949
2019-11-20 17:23:28.377 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-7222 *****
2019-11-20 17:23:28.378 | DEBUG    | __main__:evaluate:285 - acc = 0.8380775577557755
2019-11-20 17:23:28.378 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.744981129210482
2019-11-20 17:23:28.378 | DEBUG    | __main__:evaluate:285 - f1 = 0.6518847006651886
2019-11-20 17:23:28.380 | DEBUG    | __main__:train:184 - global_step = 7222, average loss = 593.8171342992719
2019-11-20 17:26:14.662 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-7536 *****
2019-11-20 17:26:14.663 | DEBUG    | __main__:evaluate:285 - acc = 0.8378712871287128
2019-11-20 17:26:14.663 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7512735275234008
2019-11-20 17:26:14.663 | DEBUG    | __main__:evaluate:285 - f1 = 0.6646757679180887
2019-11-20 17:26:14.665 | DEBUG    | __main__:train:184 - global_step = 7536, average loss = 602.1226710013143
2019-11-20 17:29:00.750 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-7850 *****
2019-11-20 17:29:00.750 | DEBUG    | __main__:evaluate:285 - acc = 0.8364273927392739
2019-11-20 17:29:00.750 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7536225797237257
2019-11-20 17:29:00.750 | DEBUG    | __main__:evaluate:285 - f1 = 0.6708177667081776
2019-11-20 17:29:00.752 | DEBUG    | __main__:train:184 - global_step = 7850, average loss = 607.0698662492941
2019-11-20 17:31:46.802 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-8164 *****
2019-11-20 17:31:46.803 | DEBUG    | __main__:evaluate:285 - acc = 0.8461221122112211
2019-11-20 17:31:46.803 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7640670237100436
2019-11-20 17:31:46.803 | DEBUG    | __main__:evaluate:285 - f1 = 0.6820119352088662
2019-11-20 17:31:46.805 | DEBUG    | __main__:train:184 - global_step = 8164, average loss = 613.1773460193217
2019-11-20 17:34:32.895 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-8478 *****
2019-11-20 17:34:32.895 | DEBUG    | __main__:evaluate:285 - acc = 0.8438531353135313
2019-11-20 17:34:32.895 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7438089205979421
2019-11-20 17:34:32.895 | DEBUG    | __main__:evaluate:285 - f1 = 0.6437647058823529
2019-11-20 17:34:32.897 | DEBUG    | __main__:train:184 - global_step = 8478, average loss = 616.8636786869247
2019-11-20 17:37:18.917 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-8792 *****
2019-11-20 17:37:18.918 | DEBUG    | __main__:evaluate:285 - acc = 0.8407590759075908
2019-11-20 17:37:18.918 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7503354850903592
2019-11-20 17:37:18.918 | DEBUG    | __main__:evaluate:285 - f1 = 0.6599118942731277
2019-11-20 17:37:18.920 | DEBUG    | __main__:train:184 - global_step = 8792, average loss = 621.6626086741671
2019-11-20 17:40:05.082 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-9106 *****
2019-11-20 17:40:05.082 | DEBUG    | __main__:evaluate:285 - acc = 0.8341584158415841
2019-11-20 17:40:05.083 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7533984261618344
2019-11-20 17:40:05.083 | DEBUG    | __main__:evaluate:285 - f1 = 0.6726384364820847
2019-11-20 17:40:05.085 | DEBUG    | __main__:train:184 - global_step = 9106, average loss = 626.456270354669
2019-11-20 17:42:51.460 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-9420 *****
2019-11-20 17:42:51.461 | DEBUG    | __main__:evaluate:285 - acc = 0.8236386138613861
2019-11-20 17:42:51.461 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7457998894549649
2019-11-20 17:42:51.461 | DEBUG    | __main__:evaluate:285 - f1 = 0.6679611650485437
2019-11-20 17:42:51.463 | DEBUG    | __main__:train:184 - global_step = 9420, average loss = 632.4476002656011
2019-11-20 17:45:37.417 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-9734 *****
2019-11-20 17:45:37.417 | DEBUG    | __main__:evaluate:285 - acc = 0.8378712871287128
2019-11-20 17:45:37.417 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7498306521702085
2019-11-20 17:45:37.417 | DEBUG    | __main__:evaluate:285 - f1 = 0.661790017211704
2019-11-20 17:45:37.419 | DEBUG    | __main__:train:184 - global_step = 9734, average loss = 636.470879416549
2019-11-20 17:48:24.273 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-10048 *****
2019-11-20 17:48:24.273 | DEBUG    | __main__:evaluate:285 - acc = 0.834983498349835
2019-11-20 17:48:24.273 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.74034383066296
2019-11-20 17:48:24.274 | DEBUG    | __main__:evaluate:285 - f1 = 0.645704162976085
2019-11-20 17:48:24.275 | DEBUG    | __main__:train:184 - global_step = 10048, average loss = 642.8049634933559
2019-11-20 17:51:10.425 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-10362 *****
2019-11-20 17:51:10.426 | DEBUG    | __main__:evaluate:285 - acc = 0.8395214521452146
2019-11-20 17:51:10.426 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7471786497638229
2019-11-20 17:51:10.426 | DEBUG    | __main__:evaluate:285 - f1 = 0.6548358473824313
2019-11-20 17:51:10.428 | DEBUG    | __main__:train:184 - global_step = 10362, average loss = 647.4692237494601
2019-11-20 17:53:56.339 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-10676 *****
2019-11-20 17:53:56.340 | DEBUG    | __main__:evaluate:285 - acc = 0.8292079207920792
2019-11-20 17:53:56.340 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7557398007644203
2019-11-20 17:53:56.340 | DEBUG    | __main__:evaluate:285 - f1 = 0.6822716807367614
2019-11-20 17:53:56.342 | DEBUG    | __main__:train:184 - global_step = 10676, average loss = 649.9958456834866
2019-11-20 17:56:42.443 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-10990 *****
2019-11-20 17:56:42.443 | DEBUG    | __main__:evaluate:285 - acc = 0.8331270627062707
2019-11-20 17:56:42.443 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.754569138081209
2019-11-20 17:56:42.444 | DEBUG    | __main__:evaluate:285 - f1 = 0.6760112134561473
2019-11-20 17:56:42.445 | DEBUG    | __main__:train:184 - global_step = 10990, average loss = 653.8833062955382
2019-11-20 17:59:28.371 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-11304 *****
2019-11-20 17:59:28.372 | DEBUG    | __main__:evaluate:285 - acc = 0.8409653465346535
2019-11-20 17:59:28.372 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7576182879694814
2019-11-20 17:59:28.372 | DEBUG    | __main__:evaluate:285 - f1 = 0.6742712294043093
2019-11-20 17:59:28.374 | DEBUG    | __main__:train:184 - global_step = 11304, average loss = 656.4278341530298
2019-11-20 18:02:14.633 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-11618 *****
2019-11-20 18:02:14.633 | DEBUG    | __main__:evaluate:285 - acc = 0.8366336633663366
2019-11-20 18:02:14.633 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7607989319218318
2019-11-20 18:02:14.634 | DEBUG    | __main__:evaluate:285 - f1 = 0.684964200477327
2019-11-20 18:02:14.636 | DEBUG    | __main__:train:184 - global_step = 11618, average loss = 658.0569205837874
2019-11-20 18:05:00.611 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-11932 *****
2019-11-20 18:05:00.611 | DEBUG    | __main__:evaluate:285 - acc = 0.8407590759075908
2019-11-20 18:05:00.611 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.756681488759563
2019-11-20 18:05:00.612 | DEBUG    | __main__:evaluate:285 - f1 = 0.6726039016115352
2019-11-20 18:05:00.613 | DEBUG    | __main__:train:184 - global_step = 11932, average loss = 660.7568418524697
2019-11-20 18:07:46.680 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-12246 *****
2019-11-20 18:07:46.680 | DEBUG    | __main__:evaluate:285 - acc = 0.8372524752475248
2019-11-20 18:07:46.681 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.749530652540179
2019-11-20 18:07:46.681 | DEBUG    | __main__:evaluate:285 - f1 = 0.6618088298328333
2019-11-20 18:07:46.682 | DEBUG    | __main__:train:184 - global_step = 12246, average loss = 663.0300147065518
2019-11-20 18:10:32.697 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-12560 *****
2019-11-20 18:10:32.697 | DEBUG    | __main__:evaluate:285 - acc = 0.8378712871287128
2019-11-20 18:10:32.698 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7573402488275143
2019-11-20 18:10:32.699 | DEBUG    | __main__:evaluate:285 - f1 = 0.6768092105263158
2019-11-20 18:10:32.700 | DEBUG    | __main__:train:184 - global_step = 12560, average loss = 665.0071237504835
2019-11-20 18:13:19.419 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-12874 *****
2019-11-20 18:13:19.419 | DEBUG    | __main__:evaluate:285 - acc = 0.8378712871287128
2019-11-20 18:13:19.419 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.744113579507417
2019-11-20 18:13:19.420 | DEBUG    | __main__:evaluate:285 - f1 = 0.650355871886121
2019-11-20 18:13:19.421 | DEBUG    | __main__:train:184 - global_step = 12874, average loss = 668.1252887135015
2019-11-20 18:16:05.798 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-13188 *****
2019-11-20 18:16:05.798 | DEBUG    | __main__:evaluate:285 - acc = 0.8279702970297029
2019-11-20 18:16:05.799 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7492932053868894
2019-11-20 18:16:05.799 | DEBUG    | __main__:evaluate:285 - f1 = 0.6706161137440758
2019-11-20 18:16:05.800 | DEBUG    | __main__:train:184 - global_step = 13188, average loss = 670.8774752367863
2019-11-20 18:18:51.805 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-13502 *****
2019-11-20 18:18:51.805 | DEBUG    | __main__:evaluate:285 - acc = 0.8403465346534653
2019-11-20 18:18:51.805 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7523501970752149
2019-11-20 18:18:51.805 | DEBUG    | __main__:evaluate:285 - f1 = 0.6643538594969643
2019-11-20 18:18:51.808 | DEBUG    | __main__:train:184 - global_step = 13502, average loss = 672.8977431547755
2019-11-20 18:21:37.871 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-13816 *****
2019-11-20 18:21:37.871 | DEBUG    | __main__:evaluate:285 - acc = 0.8426155115511551
2019-11-20 18:21:37.871 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7624156274956941
2019-11-20 18:21:37.871 | DEBUG    | __main__:evaluate:285 - f1 = 0.6822157434402332
2019-11-20 18:21:37.873 | DEBUG    | __main__:train:184 - global_step = 13816, average loss = 674.5179278387332
2019-11-20 18:24:24.016 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-14130 *****
2019-11-20 18:24:24.017 | DEBUG    | __main__:evaluate:285 - acc = 0.8391089108910891
2019-11-20 18:24:24.017 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7587960053218843
2019-11-20 18:24:24.017 | DEBUG    | __main__:evaluate:285 - f1 = 0.6784830997526794
2019-11-20 18:24:24.019 | DEBUG    | __main__:train:184 - global_step = 14130, average loss = 676.9487044550051
2019-11-20 18:27:09.899 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-14444 *****
2019-11-20 18:27:09.900 | DEBUG    | __main__:evaluate:285 - acc = 0.8479785478547854
2019-11-20 18:27:09.900 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7577736672517048
2019-11-20 18:27:09.900 | DEBUG    | __main__:evaluate:285 - f1 = 0.6675687866486243
2019-11-20 18:27:09.902 | DEBUG    | __main__:train:184 - global_step = 14444, average loss = 678.1061210342468
2019-11-20 18:29:55.816 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-14758 *****
2019-11-20 18:29:55.816 | DEBUG    | __main__:evaluate:285 - acc = 0.8432343234323433
2019-11-20 18:29:55.817 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7525780158086983
2019-11-20 18:29:55.817 | DEBUG    | __main__:evaluate:285 - f1 = 0.6619217081850534
2019-11-20 18:29:55.818 | DEBUG    | __main__:train:184 - global_step = 14758, average loss = 679.224898127366
2019-11-20 18:32:42.141 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-15072 *****
2019-11-20 18:32:42.141 | DEBUG    | __main__:evaluate:285 - acc = 0.845503300330033
2019-11-20 18:32:42.141 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7584251911917826
2019-11-20 18:32:42.141 | DEBUG    | __main__:evaluate:285 - f1 = 0.6713470820535323
2019-11-20 18:32:42.143 | DEBUG    | __main__:train:184 - global_step = 15072, average loss = 680.838724910167
2019-11-20 18:35:28.100 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-15386 *****
2019-11-20 18:35:28.100 | DEBUG    | __main__:evaluate:285 - acc = 0.8438531353135313
2019-11-20 18:35:28.101 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7565705169796141
2019-11-20 18:35:28.101 | DEBUG    | __main__:evaluate:285 - f1 = 0.6692878986456968
2019-11-20 18:35:28.103 | DEBUG    | __main__:train:184 - global_step = 15386, average loss = 682.0702665669842
2019-11-20 18:38:14.136 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-15700 *****
2019-11-20 18:38:14.136 | DEBUG    | __main__:evaluate:285 - acc = 0.8494224422442245
2019-11-20 18:38:14.136 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7555731302972744
2019-11-20 18:38:14.137 | DEBUG    | __main__:evaluate:285 - f1 = 0.6617238183503243
2019-11-20 18:38:14.138 | DEBUG    | __main__:train:184 - global_step = 15700, average loss = 683.9963068385687
2019-11-20 18:41:00.086 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-16014 *****
2019-11-20 18:41:00.087 | DEBUG    | __main__:evaluate:285 - acc = 0.843440594059406
2019-11-20 18:41:00.087 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7626008001743572
2019-11-20 18:41:00.087 | DEBUG    | __main__:evaluate:285 - f1 = 0.6817610062893082
2019-11-20 18:41:00.089 | DEBUG    | __main__:train:184 - global_step = 16014, average loss = 684.4776186489898
2019-11-20 18:43:46.803 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-16328 *****
2019-11-20 18:43:46.803 | DEBUG    | __main__:evaluate:285 - acc = 0.8436468646864687
2019-11-20 18:43:46.804 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7497071743959137
2019-11-20 18:43:46.804 | DEBUG    | __main__:evaluate:285 - f1 = 0.6557674841053588
2019-11-20 18:43:46.805 | DEBUG    | __main__:train:184 - global_step = 16328, average loss = 686.6617673452411
2019-11-20 18:46:32.816 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-16642 *****
2019-11-20 18:46:32.817 | DEBUG    | __main__:evaluate:285 - acc = 0.843440594059406
2019-11-20 18:46:32.817 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7609833211720156
2019-11-20 18:46:32.817 | DEBUG    | __main__:evaluate:285 - f1 = 0.6785260482846253
2019-11-20 18:46:32.819 | DEBUG    | __main__:train:184 - global_step = 16642, average loss = 687.6655848132632
2019-11-20 18:49:18.831 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-16956 *****
2019-11-20 18:49:18.832 | DEBUG    | __main__:evaluate:285 - acc = 0.8401402640264026
2019-11-20 18:49:18.832 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.766361286317049
2019-11-20 18:49:18.832 | DEBUG    | __main__:evaluate:285 - f1 = 0.6925823086076953
2019-11-20 18:49:18.834 | DEBUG    | __main__:train:184 - global_step = 16956, average loss = 688.9639044378582
2019-11-20 18:52:04.844 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-17270 *****
2019-11-20 18:52:04.845 | DEBUG    | __main__:evaluate:285 - acc = 0.8397277227722773
2019-11-20 18:52:04.845 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7623851422941239
2019-11-20 18:52:04.845 | DEBUG    | __main__:evaluate:285 - f1 = 0.6850425618159707
2019-11-20 18:52:04.847 | DEBUG    | __main__:train:184 - global_step = 17270, average loss = 689.9968900534741
2019-11-20 18:54:50.653 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-17584 *****
2019-11-20 18:54:50.654 | DEBUG    | __main__:evaluate:285 - acc = 0.8465346534653465
2019-11-20 18:54:50.654 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7665697698666244
2019-11-20 18:54:50.654 | DEBUG    | __main__:evaluate:285 - f1 = 0.6866048862679023
2019-11-20 18:54:50.656 | DEBUG    | __main__:train:184 - global_step = 17584, average loss = 690.4660312348888
2019-11-20 18:57:36.681 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-17898 *****
2019-11-20 18:57:36.681 | DEBUG    | __main__:evaluate:285 - acc = 0.8521039603960396
2019-11-20 18:57:36.681 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7692964052089513
2019-11-20 18:57:36.682 | DEBUG    | __main__:evaluate:285 - f1 = 0.6864888500218628
2019-11-20 18:57:36.683 | DEBUG    | __main__:train:184 - global_step = 17898, average loss = 690.8063485145622
2019-11-20 19:00:22.596 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-18212 *****
2019-11-20 19:00:22.597 | DEBUG    | __main__:evaluate:285 - acc = 0.8442656765676567
2019-11-20 19:00:22.597 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7629173174065613
2019-11-20 19:00:22.597 | DEBUG    | __main__:evaluate:285 - f1 = 0.6815689582454659
2019-11-20 19:00:22.599 | DEBUG    | __main__:train:184 - global_step = 18212, average loss = 690.8101217195444
2019-11-20 19:03:09.693 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-18526 *****
2019-11-20 19:03:09.694 | DEBUG    | __main__:evaluate:285 - acc = 0.8413778877887789
2019-11-20 19:03:09.694 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7621322428634616
2019-11-20 19:03:09.694 | DEBUG    | __main__:evaluate:285 - f1 = 0.6828865979381443
2019-11-20 19:03:09.696 | DEBUG    | __main__:train:184 - global_step = 18526, average loss = 690.8136507947911
2019-11-20 19:05:55.735 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-18840 *****
2019-11-20 19:05:55.735 | DEBUG    | __main__:evaluate:285 - acc = 0.8415841584158416
2019-11-20 19:05:55.735 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7610583021696513
2019-11-20 19:05:55.736 | DEBUG    | __main__:evaluate:285 - f1 = 0.680532445923461
2019-11-20 19:05:55.737 | DEBUG    | __main__:train:184 - global_step = 18840, average loss = 690.8155212802435
2019-11-20 19:08:41.534 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-19154 *****
2019-11-20 19:08:41.534 | DEBUG    | __main__:evaluate:285 - acc = 0.8426155115511551
2019-11-20 19:08:41.534 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7616175088103201
2019-11-20 19:08:41.535 | DEBUG    | __main__:evaluate:285 - f1 = 0.6806195060694852
2019-11-20 19:08:41.536 | DEBUG    | __main__:train:184 - global_step = 19154, average loss = 690.8172838623759
2019-11-20 19:11:27.569 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-19468 *****
2019-11-20 19:11:27.569 | DEBUG    | __main__:evaluate:285 - acc = 0.8337458745874587
2019-11-20 19:11:27.569 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7589575767921619
2019-11-20 19:11:27.569 | DEBUG    | __main__:evaluate:285 - f1 = 0.6841692789968651
2019-11-20 19:11:27.571 | DEBUG    | __main__:train:184 - global_step = 19468, average loss = 690.8207677595942
2019-11-20 19:14:13.382 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-19782 *****
2019-11-20 19:14:13.383 | DEBUG    | __main__:evaluate:285 - acc = 0.8364273927392739
2019-11-20 19:14:13.383 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7608099647300974
2019-11-20 19:14:13.383 | DEBUG    | __main__:evaluate:285 - f1 = 0.685192536720921
2019-11-20 19:14:13.385 | DEBUG    | __main__:train:184 - global_step = 19782, average loss = 691.1296545948026
2019-11-20 19:16:59.239 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-20096 *****
2019-11-20 19:16:59.239 | DEBUG    | __main__:evaluate:285 - acc = 0.8391089108910891
2019-11-20 19:16:59.240 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7634295555256085
2019-11-20 19:16:59.240 | DEBUG    | __main__:evaluate:285 - f1 = 0.687750200160128
2019-11-20 19:16:59.242 | DEBUG    | __main__:train:184 - global_step = 20096, average loss = 691.3343624458628
2019-11-20 19:19:45.115 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-20410 *****
2019-11-20 19:19:45.116 | DEBUG    | __main__:evaluate:285 - acc = 0.8502475247524752
2019-11-20 19:19:45.116 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7661920636021571
2019-11-20 19:19:45.116 | DEBUG    | __main__:evaluate:285 - f1 = 0.6821366024518388
2019-11-20 19:19:45.118 | DEBUG    | __main__:train:184 - global_step = 20410, average loss = 691.625698912063
2019-11-20 19:22:30.924 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-20724 *****
2019-11-20 19:22:30.924 | DEBUG    | __main__:evaluate:285 - acc = 0.8450907590759076
2019-11-20 19:22:30.924 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7641733845991094
2019-11-20 19:22:30.924 | DEBUG    | __main__:evaluate:285 - f1 = 0.6832560101223112
2019-11-20 19:22:30.926 | DEBUG    | __main__:train:184 - global_step = 20724, average loss = 691.6273050085049
2019-11-20 19:25:17.046 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-21038 *****
2019-11-20 19:25:17.046 | DEBUG    | __main__:evaluate:285 - acc = 0.8457095709570958
2019-11-20 19:25:17.046 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.764648355698514
2019-11-20 19:25:17.047 | DEBUG    | __main__:evaluate:285 - f1 = 0.6835871404399322
2019-11-20 19:25:17.048 | DEBUG    | __main__:train:184 - global_step = 21038, average loss = 691.7670859001344
2019-11-20 19:28:02.872 | DEBUG    | __main__:evaluate:283 - ***** Eval results checkpoint-21352 *****
2019-11-20 19:28:02.872 | DEBUG    | __main__:evaluate:285 - acc = 0.8457095709570958
2019-11-20 19:28:02.872 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7641111522018076
2019-11-20 19:28:02.872 | DEBUG    | __main__:evaluate:285 - f1 = 0.6825127334465195
2019-11-20 19:28:02.874 | DEBUG    | __main__:train:184 - global_step = 21352, average loss = 691.9310709817246
2019-11-20 19:29:51.540 | DEBUG    | __main__:<module>:528 - global_step = 21597, average loss = 0.03203832905760044
2019-11-20 19:30:32.211 | DEBUG    | __main__:evaluate:283 - ***** Eval results  *****
2019-11-20 19:30:32.212 | DEBUG    | __main__:evaluate:285 - acc = 0.8457095709570958
2019-11-20 19:30:32.212 | DEBUG    | __main__:evaluate:285 - acc_and_f1 = 0.7639762808056507
2019-11-20 19:30:32.212 | DEBUG    | __main__:evaluate:285 - f1 = 0.6822429906542056

The average loss of each checkpoint is actually total loss (I did the arithmetic wrong there) but if one divide it by global steps, it can be seen descending steadily.

The average loss of each checkpoint is actually total loss

Except for the last one?

With Adam ϵ=1e-4 and learning rate=5e-6, after 4 training epochs, the Bert model gives F1 of approximately 0.8.

XLM seems to a few times faster:

2019-11-27 15:10:59.698 | DEBUG    | __main__:<module>:491 - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False
2019-11-27 15:11:53.914 | DEBUG    | __main__:<module>:526 - Training/evaluation parameters Namespace(adam_epsilon=0.0001, cache_dir=None, config=None, cuda=True, data_dir='mrpc', device=device(type='cuda'), eval=True, eval_all_checkpoints=False, eval_while_train=True, gradient_accumulation_steps=1, learning_rate=5e-06, local_rank=-1, log_file='xlm/2019-11-27T15:10:57+00:00.log', logging_steps=None, lower_case=False, max_grad_norm=1.0, max_seq=128, max_steps=-1, model='xlm-mlm-17-1280', model_type='xlm', n_gpu=1, num_train_epochs=3.0, output_dir='xlm', output_mode='classification', overwrite_cache=True, overwrite_output_dir=True, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=8, save_steps=None, seed=42, server_ip=None, server_port=None, task='mrpc', tokenizer=None, train=True, warmup_steps=0, weight_decay=0.0)
2019-11-27 15:25:18.420 | DEBUG    | __main__:evaluate:290 - ***** Eval results checkpoint-1551 *****
2019-11-27 15:25:18.420 | DEBUG    | __main__:evaluate:292 - acc = 0.8576979116075765
2019-11-27 15:25:18.421 | DEBUG    | __main__:evaluate:292 - acc_and_f1 = 0.8389900312484826
2019-11-27 15:25:18.421 | DEBUG    | __main__:evaluate:292 - f1 = 0.8202821508893887
2019-11-27 15:25:18.421 | DEBUG    | __main__:evaluate:292 - loss = 0.3800258531800217
2019-11-27 15:25:18.421 | DEBUG    | __main__:train:189 - average loss = 0.5096873833379347
2019-11-27 15:38:34.338 | DEBUG    | __main__:evaluate:290 - ***** Eval results checkpoint-3102 *****
2019-11-27 15:38:34.339 | DEBUG    | __main__:evaluate:292 - acc = 0.8848955803788247
2019-11-27 15:38:34.339 | DEBUG    | __main__:evaluate:292 - acc_and_f1 = 0.8674319640657576
2019-11-27 15:38:34.339 | DEBUG    | __main__:evaluate:292 - f1 = 0.8499683477526906
2019-11-27 15:38:34.339 | DEBUG    | __main__:evaluate:292 - loss = 0.41474137980283693
2019-11-27 15:38:34.340 | DEBUG    | __main__:train:189 - average loss = 0.40053577265845697