Support for Pre-trained ELMo Representations for Many Languages
kermitt2 opened this issue · comments
ELMo embeddings give very good result for NER, usually much better than simple RNN and better or comparable with BERT/Roberta/etc. base transformers. They also use notably less memory than transformers and manage well 3000 tokens sequences... Using them is fast for both training and labeling.
However, the currently available ELMo embeddings in TF format is very limited, we could try to support the ELMoForManyLangs format to extend the support of languages.
It is implemented in branch https://github.com/kermitt2/delft/tree/elmoformanylangs
Replacing embeddings generated by ELMoForManyLangs instead those from ELMo BILM-TF when concatenating with Gloves embeddings does not show improvement as compared to simple BidLSTM-CRF with Gloves.
- English conll2003:
architecture | embeddings | F1-score (10-folds) |
---|---|---|
BidLSTM_CRF | gloves | 91.03 |
BidLSTM_CRF_ELMo | gloves+ELMo BILM-TF | 92.57 |
BidLSTM_CRF_ELMo | gloves+ELMoForManyLangs | 91.10 |
Note: as warming and to try to stabilize the result, ELMoForManyLangs embeddings was run 3 times (this improved the f1-score from 90.87 to 91.10).
- French LeMonde corpus (FTB)
architecture | embeddings | F1-score (10-folds) |
---|---|---|
BidLSTM_CRF | wikifr (fasttext) | 89.45 |
BidLSTM_CRF_ELMo | wikifr+FrELMo (BILM-TF) | 90.96 |
BidLSTM_CRF_ELMo | wikifr+ELMoForManyLangs | 88.65 |
The embeddings are not effective, no value apparently and closing the issue.