TheScienceMuseum / heritage-connector

Heritage Connector: Transforming text into data to extract meaning and make connections

Home Page:https://www.sciencemuseumgroup.org.uk/projects/heritage-connector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

V&A: run load, NER, NEL; train NEL

kdutia opened this issue · comments

  • check data quality (joins etc)
  • run load
  • fix NEL for large datasets
  • label NEL training data
  • test model
  • (after #342) add disambiguating descriptions to training data
  • rewrite load function so more memory efficient (doesn't rely on full dataframe) -> ended up using pd.DataFrame(series.tolist()) instead of series.apply(pd.Series), and using categorical instead of string types for descriptions
  • run with trained model

load with a threshold of 0.8