Sitoi / grobid2json

Process the XML files parsed by Grobid into JSON format.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Grobid2Json

Extract the code to parse grobid xml into json from the s2orc-doc2json project and package it as a pypi package.

✨ Features

  • Process the XML files parsed by Grobid into JSON format.

πŸ“¦ Installation

pip install grobid2json

🀯 Usage

from bs4 import BeautifulSoup
from grobid2json import convert_xml_to_json

file_path = "test.xml"
with open(file_path, "rb") as f:
    xml_data = f.read()
soup = BeautifulSoup(xml_data, "xml")
paper_id = file_path.split("/")[-1].split(".")[0]
paper = convert_xml_to_json(soup, paper_id, "")
json_data = paper.as_json()
print(json_data)

πŸ”— Links

Credits


πŸ“ License

This project is Apache License 2.0 licensed.

About

Process the XML files parsed by Grobid into JSON format.

License:Apache License 2.0


Languages

Language:Python 99.5%Language:Makefile 0.5%