The resulting model of the work done is currently uploaded on Hugging Face under the name NER-fine-tuned-BETO.
Language: es Datasets:
- conll2002
- Babelscape/wikineural
[NER-fine-tuned-BETO] is a NER model that was fine-tuned from BETO on the 2002 Conll and the WikiNEuRal spanish datasets. Model was trained on the Conll 2002 train dataset (~8320 sentences) and a bootstrapped dataset of WikiNEuRal, where we re-evaluate the dataset and only keep the sentences where all the labels matched the predictions made. Model was evaluated on the test dataset of Conll2002.
Training data was classified as follow:
Abbreviation | Description |
---|---|
O | Outside of NE |
PER | Person’s name |
ORG | Organization |
LOC | Location |
MISC | Miscellaneous |
Alongside the IOB formatting, this is:
- B-LABEL if the word is at the beggining of the entity.
- I-LABEL if the word is part of the entity name, but not the first word.
Load the model and its tokenizer :
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("NazaGara/NER-fine-tuned-BETO")
model = AutoModelForTokenClassification.from_pretrained("NazaGara/NER-fine-tuned-BETO")
nlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
nlp('Ignacio se fue de viaje por Buenos aires')
[{'entity_group': 'PER',
'score': 0.9997764,
'word': 'Ignacio',
'start': 0,
'end': 7},
{'entity_group': 'LOC',
'score': 0.9997932,
'word': 'Buenos aires',
'start': 28,
'end': 40}]
Overall
precision | recall | f1-score |
---|---|---|
0.9833 | 0.8950 | 0.8998 |
By classes
class | precision | recall | f1-score |
---|---|---|---|
O | 0.9958 | 0.9965 | 0.990 |
B-PER | 0.9572 | 0.9741 | 0.9654 |
I-PER | 0.9487 | 0.9921 | 0.9699 |
B-ORG | 0.8823 | 0.9264 | 0.9038 |
I-ORG | 0.9253 | 0.9264 | 0.9117 |
B-LOC | 0.8967 | 0.8736 | 0.8850 |
I-LOC | 0.8870 | 0.8215 | 0.8530 |
B-MISC | 0.7541 | 0.7964 | 0.7747 |
I-MISC | 0.9026 | 0.7827 | 0.8384 |
This work is licensed under a Creative Commons Attribution 4.0 International License.