josecannete / beto-benchmarking

Evaluation of Beto - Spanish BERT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is work in progress

Beto Benchmarks

The following table shows the BETO results in the Spanish version of every task/benchamrk with links to the hyperparameter settings used in every experiment. We compare BETO (cased and uncased) with the Best Multilingual-BERT result that we found in the literature (as of October 2019) highlighting the results whenever BETO ourperform Multilingual BERT. The table also shows some alternative methods for the same tasks (not necessarily BERT-based methods). References for all methods are included below.

Task BETO-cased BETO-uncased Best Multilingual Bert Other results
XNLI ----- 80.15 78.50 [2] 80.80 [5], 77.80 [1], 73.15 [4]
POS 98.97 98.44 97.10 [2] 98.91 [6], 96.71 [3]
PAWS-X 89.05 89.55 90.70 [8]
NER-C 87.24 82.67 87.38 [2] 87.18 [3]
NER-W ----- ----- 92.50 [7]
MLDoc 95.27 95.25 95.70 [2] 88.75 [4]
DepPar ----- ----- 92.3/86.5 [2]
MLQA ----- ----- 64.3/46.6 [9] 68.0/49.8 [10]
XQuAD ----- ----- 74.30 [11]

References

Summary

XNLI

Reported:

73.15 LASER

Ours

Best: 80.15

Detailed: experiments_XNLI.txt

POS

Reported

Ours

Best: 98.44

Detailed: experiments_POS.txt

NER CoNLL2002

Reported

Ours

Best: 81.7

Detailed: experiments_NER.txt

NER WikiAnn

Reported

Ours

MLDoc

Reported

88.75 LASER

Ours

Dependency Parsing

Reported

Ours

PAWS-X

Reported

Ours

Best: 89.55

Detailed: experiments_PAWSX.txt

MLQA

Reported

XQuAD

Reported

Ours

About

Evaluation of Beto - Spanish BERT