Doccano Transformer helps you to transform an exported dataset into the format of your favorite machine learning library.
Doccano Transformer supports the following formats:
- CoNLL 2003
- spaCy
To install doccano-transformer
, simply use pip
:
pip install doccano-transformer
The following formats are supported:
- CoNLL 2003
- spaCy
from doccano_transformer.datasets import NERDataset
from doccano_transformer.utils import read_jsonl
dataset = read_jsonl(filepath='example.jsonl', dataset=NERDataset, encoding='utf-8')
dataset.to_conll2003(tokenizer=str.split)
dataset.to_spacy(tokenizer=str.split)
We encourage you to contribute to doccano transformer! Please check out the Contributing to doccano transformer guide for guidelines about how to proceed.