TheScienceMuseum / heritage-connector-nlp

NLP tools for heritage collections

Home Page:https://www.sciencemuseumgroup.org.uk/projects/heritage-connector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

heritage-connector-nlp

Text processing for the Heritage Connector: a set of NLP utilities for the Heritage sector.

For more information see https://doi.org/10.1002/ail2.23.

--- IN DEVELOPMENT ---

Includes:

  • low-data extensions for information extraction (NER, NEL, relation classification)
  • labelling (Label Studio)
  • test suite for models

Usage

Label Studio

Setting up (first time):

  1. Run label-studio start labelling --init, which will start up Label Studio and take you to a configuration wizard.
  2. Select Named Entity Recognition from the top menu, and fill in the entity types you want to annotate

Running: Run label-studio start labelling from the root directory.

Useful parameters:

  • --sampling=uniform: have Label Studio show documents in a random order
  • --label-config label_studio_config_sample.xml: load config from a file

About

NLP tools for heritage collections

https://www.sciencemuseumgroup.org.uk/projects/heritage-connector

License:MIT License


Languages

Language:Jupyter Notebook 89.5%Language:Python 10.5%