dataesr / bso-parser-html

Extract structured metadata (affiliations, authors name and orcid, keywords ...) from raw html pages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bso-parser-html

Extract structured metadata informations from raw html

Metadata extracted includes, when possible:

  • affiliations
  • keywords
  • authors name
  • authors affiliations
  • authors orcid
  • abstract
  • ackowledgments
  • funding

About

Extract structured metadata (affiliations, authors name and orcid, keywords ...) from raw html pages

License:MIT License


Languages

Language:Python 95.2%Language:HTML 1.5%Language:Jupyter Notebook 1.3%Language:Dockerfile 0.8%Language:JavaScript 0.6%Language:Makefile 0.5%Language:CSS 0.1%