ju-bezdek / slovakbert-conll2003-sk-ner

source code for finetuning slovakbert NLP modeln on conll2003-sk-ner dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description

Traning procedure and evaluation for ju-bezdek/slovakbert-conll2003-sk-ner

Training

For local training run

python src/train.py

For training on azure run

python train_on_azure.py create_ws -sub_id -ws -rg

then

python train_on_azure.py run_remote

Trained model usage

from transformers import pipeline, AutoModel, AutoTokenizer
from spacy import displacy
import os


model_path="ju-bezdek/slovakbert-conll2003-sk-ner"

aggregation_strategy="max"
ner_pipeline = pipeline(task='ner', model=model_path, aggregation_strategy=aggregation_strategy)

input_sentence= "Ruský premiér Viktor Černomyrdin v piatok povedal, že prezident Boris Jeľcin , ktorý je na dovolenke mimo Moskvy , podporil mierový plán šéfa bezpečnosti Alexandra Lebedu pre Čečensko, uviedla tlačová agentúra Interfax"
ner_ents = ner_pipeline(input_sentence)
print(ner_ents)

ent_group_labels = [ner_pipeline.model.config.id2label[i][2:] for i in ner_pipeline.model.config.id2label if i>0]

options = {"ents":ent_group_labels}

dicplacy_ents = [{"start":ent["start"], "end":ent["end"], "label":ent["entity_group"]} for ent in ner_ents]
displacy.render({"text":input_sentence, "ents":dicplacy_ents}, style="ent", options=options, jupyter=True, manual=True)

Result:

Ruský MISC premiér Viktor Černomyrdin PER v piatok povedal, že prezident Boris Jeľcin, PER , ktorý je na dovolenke mimo Moskvy LOC , podporil mierový plán šéfa bezpečnosti Alexandra Lebedu PER pre Čečensko, LOC uviedla tlačová agentúra Interfax ORG
[{
    'entity_group': 'MISC',
    'score': 0.82277083,
    'word': ' Ruský',
    'start': 0,
    'end': 5
}, {
    'entity_group': 'PER',
    'score': 0.9821574,
    'word': ' Viktor Černomyrdin',
    'start': 14,
    'end': 32
}, {
    'entity_group': 'PER',
    'score': 0.9796225,
    'word': ' Boris Jeľcin',
    'start': 64,
    'end': 76
}, {
    'entity_group': 'LOC',
    'score': 0.94837284,
    'word': ' Moskvy',
    'start': 106,
    'end': 112
}, {
    'entity_group': 'PER',
    'score': 0.94473803,
    'word': ' Alexandra Lebedu',
    'start': 154,
    'end': 170
}, {
    'entity_group': 'LOC',
    'score': 0.81060684,
    'word': ' Čečensko,',
    'start': 175,
    'end': 184
}, {
    'entity_group': 'ORG',
    'score': 0.9785074,
    'word': ' Interfax',
    'start': 210,
    'end': 218
}]

About

source code for finetuning slovakbert NLP modeln on conll2003-sk-ner dataset


Languages

Language:Jupyter Notebook 60.8%Language:Python 39.2%