Is LMv2 using resnext101 or just res101?
Senwang98 opened this issue · comments
Describe
Model I am using (LayoutLMv2):
Hi, I'm doing some works based on LMv2, but I found FUNSD's F1 score is lower than that reported in paper.
So, I want to know the open source code using which visual backbone now?
It seems that the provided code about detectron2 config is using Res101 rather X101, so the f1 score is lower.
I am not sure my option is right or not.
BTW, I want to know that if I train CORD dataset based on BERT-base with longer epoches, BERT-base f1 score will higher than BERT-Large?
(When I reproduced CORD dataset on LMv2, I trained LMv2 10epoch rather than 5epoch, I found my BERT-base test f1 score = 96.3)
Hi @Senwang98 , we use resnext101 as the visual backbone. Loading the weight from the pre-trained model directly is ok, just make sure parameters are loaded correctly.
For the 2nd question, I think the first step is checking the evaluation and data process code, if there is nothing wrong in the code, try to make sure whether this phenomenon could be reproduced on other models(such as Bert).
@Dod-o
Ok, thanks for your reply!
For Q1, I think you are right.
For Q2, I am new here in NLP, I will check my codes.
BTW, Do you have any plan to release training codes about CORD just like FUNSD?
@Dod-o
I re-trained LMv2(layoutlmv2-base-uncased) with CORD, 5 epoch achieve 95.4 and 10 epoch got 96.03
***** eval metrics *****
epoch = 10.0
eval_accuracy = 0.9776
eval_f1 = 0.9741
eval_loss = 0.1345
eval_precision = 0.9756
eval_recall = 0.9725
eval_runtime = 0:00:02.73
eval_samples_per_second = 36.594
eval_steps_per_second = 4.757
***** Running Prediction *****
Num examples = 100
Batch size = 8
92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 12/13 [00:02<00:00, 4.59it/s]***** test metrics *****
test_accuracy = 0.9728
test_f1 = 0.9603
test_loss = 0.162
test_precision = 0.9592
test_recall = 0.9614
test_runtime = 0:00:02.70
test_samples_per_second = 36.974
test_steps_per_second = 4.807
hi @Senwang98, The CORD code is similar to FUNSD, refer to the FUNSD code is ok.
And I'd like to know how did you calculate the F1? Did you use the token-level F1 rather than entity-level?
@Dod-o
I copy the metric codes from run_funsd.py.
So, I think I use token-level F1.
@Senwang98 I think we use entity-level metrics, maybe you can double-check about this.