mlibrary / alma.tar.gz-to-marcinjson

Executable jar to turn alma's stupid .tar.gz files into jsonl.gz

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

alma.tar.gz-to-marcinjson

Turn alma marc-xml export files into nicer marc-in-json jsonl files

Alma exports marc-xml files as a bunch of <whatever>.tar.gz files, each of which has the single file <whatever>.xml in it.

This is code that creates a fat .jar (i.e., all dependencies included) that will take any number of <whatever>.tar.gz files and produce <whatever>.jsonl.gz files in the directory you invoked the program from.

Usage

This is an executable .jar file that only takes filenames to convert as arguments.

java -jar /path/to/alma.tar.gz-to-marcinxml /path/to/alma/*.tar.gz

Performance

It's not ridiculously fast (e.g., it doesn't use Jackson custom serializers like it should and isn't even multi-threaded), but it'll convert the University of Michigan's full export of some 14.5M records on my laptop in about 15mn.

About

Executable jar to turn alma's stupid .tar.gz files into jsonl.gz


Languages

Language:Java 100.0%