Is LMv2 using resnext101 or just res101?

Question

Is LMv2 using resnext101 or just res101?

Senwang98 opened this issue 2 years ago · comments

Describe
Model I am using (LayoutLMv2):

Hi, I'm doing some works based on LMv2, but I found FUNSD's F1 score is lower than that reported in paper.
So, I want to know the open source code using which visual backbone now?
It seems that the provided code about detectron2 config is using Res101 rather X101, so the f1 score is lower.
I am not sure my option is right or not.

BTW, I want to know that if I train CORD dataset based on BERT-base with longer epoches, BERT-base f1 score will higher than BERT-Large?
(When I reproduced CORD dataset on LMv2, I trained LMv2 10epoch rather than 5epoch, I found my BERT-base test f1 score = 96.3)

Dod-o · Answer 1 · Mon May 16 2022 23:48:48 GMT+0800 (China Standard Time)

Hi @Senwang98 , we use resnext101 as the visual backbone. Loading the weight from the pre-trained model directly is ok, just make sure parameters are loaded correctly.

For the 2nd question, I think the first step is checking the evaluation and data process code, if there is nothing wrong in the code, try to make sure whether this phenomenon could be reproduced on other models(such as Bert).

Egqawkq · Answer 2 · Tue May 17 2022 00:22:41 GMT+0800 (China Standard Time)

@Dod-o
Ok, thanks for your reply!
For Q1, I think you are right.
For Q2, I am new here in NLP, I will check my codes.
BTW, Do you have any plan to release training codes about CORD just like FUNSD?

Egqawkq · Answer 3 · Tue May 17 2022 01:03:16 GMT+0800 (China Standard Time)

@Dod-o
I re-trained LMv2(layoutlmv2-base-uncased) with CORD, 5 epoch achieve 95.4 and 10 epoch got 96.03

***** eval metrics *****
  epoch                   =       10.0
  eval_accuracy           =     0.9776
  eval_f1                 =     0.9741
  eval_loss               =     0.1345
  eval_precision          =     0.9756
  eval_recall             =     0.9725
  eval_runtime            = 0:00:02.73
  eval_samples_per_second =     36.594
  eval_steps_per_second   =      4.757
***** Running Prediction *****
  Num examples = 100
  Batch size = 8
 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎          | 12/13 [00:02<00:00,  4.59it/s]***** test metrics *****
  test_accuracy           =     0.9728
  test_f1                 =     0.9603
  test_loss               =      0.162
  test_precision          =     0.9592
  test_recall             =     0.9614
  test_runtime            = 0:00:02.70
  test_samples_per_second =     36.974
  test_steps_per_second   =      4.807

Dod-o · Answer 4 · Wed May 18 2022 16:09:39 GMT+0800 (China Standard Time)

hi @Senwang98, The CORD code is similar to FUNSD, refer to the FUNSD code is ok.

And I'd like to know how did you calculate the F1? Did you use the token-level F1 rather than entity-level?

Egqawkq · Answer 5 · Wed May 18 2022 16:40:41 GMT+0800 (China Standard Time)

@Dod-o
I copy the metric codes from run_funsd.py.
So, I think I use token-level F1.

Dod-o · Answer 6 · Thu May 26 2022 15:48:47 GMT+0800 (China Standard Time)

@Senwang98 I think we use entity-level metrics, maybe you can double-check about this.