kermitt2 / grobid

A machine learning software for extracting information from scholarly documents

Home Page:https://grobid.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to add new tags?

dlculver opened this issue · comments

Hello,

I am interested in training my own Grobid to work on documents in a different domain from scientific papers. At the moment, I want to train a header model to identify particular parties in my documents. I am a bit confused as to what this process is. As I understand it, I am supposed to take some pdfs, I use Grobid's batch mode to generate training and evaluating data, I then annotate this manually, and then train the model. However, I am very confused about how to add new tags to TEI schemas. Where, in particular, do I need to add new tags in order to train a header model.

Thanks!

Dear @dlculver,
thanks for your interest in Grobid. Modifying the training data is a complex process at first.

Could you please explain a bit more in detail what you want to do?
With "add new tags" do you mean to extend the existing tagset? or to just use the existing tags for additional objects in the TEI?