alicadaly / psClassify

Assigns a probability that a name in the Patstat database belongs to a person and not to an entity that is not a person (eg. company, university)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

psClassify

a simple supervised learning algorithm to classify PATSTAT records into two categories:

  • person names
  • not person names

psClassify_pre.py extracts data and prepares for model fitting

psClassify_R.r fits the model and saves to .csv

About

Assigns a probability that a name in the Patstat database belongs to a person and not to an entity that is not a person (eg. company, university)


Languages

Language:Python 70.5%Language:R 29.5%