annotations_creators |
language_creators |
languages |
licenses |
multilinguality |
pretty_name |
size_categories |
source_datasets |
task_categories |
task_ids |
machine-generated |
expert-generated |
|
|
|
|
|
conll-2003-sk-ner |
|
|
|
named-entity-recognition |
part-of-speech-tagging |
|
This is translated version of the original CONLL2003 dataset (translated from English to Slovak via Google translate) Annotation was done mostly automatically with word matching scripts. Records where some tags were not matched, were annotated manually (10%) Unlike the original Conll2003 dataset, this one contains only NER tags
NER
labels:
- 0: O
- 1: B-PER
- 2: I-PER
- 3: B-ORG
- 4: I-ORG
- 5: B-LOC
- 6: I-LOC
- 7: B-MISC
- 8: I-MISC
sk
train, test, val
https://huggingface.co/datasets/conll2003
- Machine Translation
- Machine pairing tags with reverse translation, and hardcoded rules (including phrase regex matching etc.)
- Manual annotation of records that couldn't be automatically matched