Pantagrueliste / multi-saxon

Rapidly transform vast amounts of TEI XML files using the power of Saxon and multiprocessing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi-saxon

DOI

multi-saxon swiftly converts large amounts of XML TEI files into text. Harnessing the power of Saxonica's SaxonC-HE processor (XSLT 2.0+), it handles XSLT 2.0 and 3.0 transformations in parallel. This approach enables users to circumvent some of the limitations of lxml, which in spite of its speed, operates exclusively within the XSLT 1.0 framework.

Features

  • Fast Transformations: Utilize the multiprocessing capabilities of your machine for simultaneous XML transformations.
  • Saxon Integration: Seamlessly process XML files using the renowned Saxon processor.
  • CSV Output: Generate comprehensive CSV reports containing relevant metadata about the processed XML TEI files.
  • Limited Logging Capabilities:

Limitations

  • multi-saxon is optimized for TEI P5 files. I do not plan on extending it to other frameworks.

Upcoming Features

- A separate config.toml file to increase metadata customization.

Installation

  1. Ensure you have Python 3.x installed on your machine. If not, download and install Python.

  2. Clone this repository:

    git clone https://github.com/Pantagrueliste/multi-saxon.git

About

Rapidly transform vast amounts of TEI XML files using the power of Saxon and multiprocessing

License:MIT License


Languages

Language:Python 100.0%