HPI-Information-Systems / art-ner-dataset

Data and code from the paper "Generation of Training Data for Named Entity Recognition of Artworks"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generation of Training Data for Named Entity Recognition of Artworks

Data and pre-trained models from the paper Generation of Training Data for Named Entity Recognition of Artworks published in the Semantic Web Journal 2023 issue.

Data

Pending approval/license by the owner of the corpus.

Models

The models can be downloaded from here

SpaCy

The Spacy pre-trained model 'en_core_web_md' was used a baseline for further training with domain related annotations. The version of Spacy is 3.3.0. Documentation related to the same is available here.

To use the spacy model to annotate a file with texts (see spacy_model/example_file.csv), download the model folder and run the script spacy_model/run_spacy.py as follows

python run_spacy.py model_location example_file.csv

Flair

The Flair model was trained using GloVe (en-glove) and forward and backward Flair Embeddings (news-X). More information on these embedding models can be found in Flair's documentation

In order to run the model with a sentence, the script flair_model/RunNER.py can be executed with the following command

python RunNER.py final-model.pt "This is a sentence"

About

Data and code from the paper "Generation of Training Data for Named Entity Recognition of Artworks"


Languages

Language:Python 100.0%